SHARE

The Universe at Your Fingertips

The usual image of an astronomer shows someone peering through the eyepiece of a large telescope observing the heavens. In reality, they are more likely to be peering at a computer screen, running queries and simulations, or studying the digitized output of telescopes on the other side of the world. On March 15, the Sloan […]

Written By

Drew Robb

Apr 26, 2004

5 minute read

Datamation content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More

The usual image of an astronomer shows someone peering through the eyepiece of a large telescope observing the heavens. In reality, they are more likely to be peering at a computer screen, running queries and simulations, or studying the digitized output of telescopes on the other side of the world.

On March 15, the Sloan Digital Sky Survey (SDSS) released its latest data set to researchers and the broad public – Data Release 2. This data set contains over six terabytes of images and the properties of more than 88 million celestial objects. This data is available on the web at www.sdss.org/DR2 or in a more public friendly format at the SkyServer site. Visitors can pan and zoom around the universe using a sort of celestial version of Mapquest and click on an object to find out the properties of a star, galaxy or quasar.

While this can be fun, from a scientific viewpoint the most important feature is the ability to query that data set for objects that meet the requirements of a research project. To fulfill this demand meant a fundamental change in the way astronomy information is normally stored and managed.

“The volume of data we were projecting was so large that the traditional methods that scientists were using wouldn’t cut it any more,” says Johns Hopkins University associate research scientist Ani Thakar.

Astronomers United

The SDSS is a project of the Astrophysical Research Consortium, a group of more than 200 astronomers at 13 institutions around the world. Its multi-year project is to map one-quarter of the sky and determine the brightness and position of several hundred million objects in it. It gathers information using a 2.5 meter telescope at the Apache Point Observatory in New Mexico. The telescope contains one of the largest imaging cameras in the world. While a typical large telescope contains a single CCD chip, the SDSS camera contains an array of thirty 4-megapixel chips.

Every two weeks, SDSS FedExes the raw imaging data to the U.S. Department of Energy’s Fermi National Accelerator Laboratory in Batavia, Illinois for processing. There it is analyzed, calibrated, put into ASCII CSV (comma separated values) format and shipped to SDSS to add to its database. Using a database is a change from the usual way astronomical data is managed.

“The whole idea of putting them into databases is first of all to ensure the integrity of the data, be able to back out changes and things like that,” Thakar explains. “The other big thing is to provide fast access to the data.”

Page 2: Moving past FITS

Normally the data from a telescope is recorded in FITS (flexible image transport system) files, a binary transport mechanism that is used extensively for astronomy data. While this is adequate for small batches of information, when talking about the hundreds of millions of records that will eventually reside in SDSS data store, FITS is too cumbersome for rapid data access.

“In order for you to search for objects that were of interest for your research would take hours, maybe days,” Thakar continues.
SDSS started out using an object oriented database (OODB), but that didn’t meet the performance requirements. It decided to switch to a relational database.

Jim Gray, a “distinguished engineer” in Microsoft’s Scalable Servers Research Group and manager of the company’s Bay Area Research Center in San Francisco, California, helped SDSS set up on Microsoft’s SQL Server 2000. The database resides on a series of off the shelf RAID 0/5 arrays with a total cost of under $10,000. The SQL database came on line with the Early Data Release in June 2001. Initially the SQL Server was just for the public access, while scientists would continue to use the OODB.

But that didn’t last for long.

“In the first six months, the SQL database stole the show,” says Thakar. “It was so much faster and easier to use that many of the scientists started using it too.”

As a result, everything was moved over to SQL Server.

The SkyServer site offers visitors several options for getting data depending on their level of expertise. There are form-based queries that anyone can use. Hard core users can run SQL queries, or submit a batch file and come back later to view the results. Users can download their results in text, CSV or XML formats. Visitors can also use a graphic interface to locate an area of the sky, zoom in and click on a particular object to find out its properties.

So far, over 200 papers have been published based on data from the SDSS. And there are many more to come as its use speeds up the research process.

“Being able to pose questions in a few hours and get answers in a few minutes changes the way one views the data: you can experiment interactively,” Microsoft’s Jim Gray and Johns Hopkins University astronomy professor Alex Szalay wrote in their paper The World-Wide Telescope, an Archetype for Online Science. “When queries take three days and hundreds of lines of code, one asks many fewer questions and so gets fewer answers.”

Ethics and Artificial Intelligence: Driving Greater Equality

FEATURE | By James Maguire,
December 16, 2020
AI vs. Machine Learning vs. Deep Learning

FEATURE | By Cynthia Harvey,
December 11, 2020
Huawei’s AI Update: Things Are Moving Faster Than We Think

FEATURE | By Rob Enderle,
December 04, 2020
Keeping Machine Learning Algorithms Honest in the ‘Ethics-First’ Era

ARTIFICIAL INTELLIGENCE | By Guest Author,
November 18, 2020
Key Trends in Chatbots and RPA

FEATURE | By Guest Author,
November 10, 2020
Top 10 AIOps Companies

FEATURE | By Samuel Greengard,
November 05, 2020
What is Text Analysis?

ARTIFICIAL INTELLIGENCE | By Guest Author,
November 02, 2020
How Intel’s Work With Autonomous Cars Could Redefine General Purpose AI

ARTIFICIAL INTELLIGENCE | By Rob Enderle,
October 29, 2020
Dell Technologies World: Weaving Together Human And Machine Interaction For AI And Robotics

ARTIFICIAL INTELLIGENCE | By Rob Enderle,
October 23, 2020
The Super Moderator, or How IBM Project Debater Could Save Social Media

FEATURE | By Rob Enderle,
October 16, 2020
Top 10 Chatbot Platforms

FEATURE | By Cynthia Harvey,
October 07, 2020
Finding a Career Path in AI

ARTIFICIAL INTELLIGENCE | By Guest Author,
October 05, 2020
CIOs Discuss the Promise of AI and Data Science

FEATURE | By Guest Author,
September 25, 2020
Microsoft Is Building An AI Product That Could Predict The Future

FEATURE | By Rob Enderle,
September 25, 2020
Top 10 Machine Learning Companies 2021

FEATURE | By Cynthia Harvey,
September 22, 2020
NVIDIA and ARM: Massively Changing The AI Landscape

ARTIFICIAL INTELLIGENCE | By Rob Enderle,
September 18, 2020
Continuous Intelligence: Expert Discussion [Video and Podcast]

ARTIFICIAL INTELLIGENCE | By James Maguire,
September 14, 2020
Artificial Intelligence: Governance and Ethics [Video]

ARTIFICIAL INTELLIGENCE | By James Maguire,
September 13, 2020
IBM Watson At The US Open: Showcasing The Power Of A Mature Enterprise-Class AI

FEATURE | By Rob Enderle,
September 11, 2020
Artificial Intelligence: Perception vs. Reality

FEATURE | By James Maguire,
September 09, 2020

SEE ALL
ARTICLES

Drew Robb

Drew Robb is a contributing writer for Datamation, Enterprise Storage Forum, eSecurity Planet, Channel Insider, and eWeek. He has been reporting on all areas of IT for more than 25 years. He has a degree from the University of Strathclyde UK (USUK), and lives in the Tampa Bay area of Florida.

The Universe at Your Fingertips

Drew Robb

Company

Categories

The Universe at Your Fingertips

RELATED NEWS AND ANALYSIS

Drew Robb

Company

Categories