On March 15, the Sloan Digital Sky Survey (SDSS) released its latest data set to researchers and the broad public - Data Release 2. This data set contains over six terabytes of images and the properties of more than 88 million celestial objects. This data is available on the web at www.sdss.org/DR2 or in a more public friendly format at the SkyServer site. Visitors can pan and zoom around the universe using a sort of celestial version of Mapquest and click on an object to find out the properties of a star, galaxy or quasar.
While this can be fun, from a scientific viewpoint the most important feature is the ability to query that data set for objects that meet the requirements of a research project. To fulfill this demand meant a fundamental change in the way astronomy information is normally stored and managed.
"The volume of data we were projecting was so large that the traditional methods that scientists were using wouldn't cut it any more," says Johns Hopkins University associate research scientist Ani Thakar.
The SDSS is a project of the Astrophysical Research Consortium, a group of more than 200 astronomers at 13 institutions around the world. Its multi-year project is to map one-quarter of the sky and determine the brightness and position of several hundred million objects in it. It gathers information using a 2.5 meter telescope at the Apache Point Observatory in New Mexico. The telescope contains one of the largest imaging cameras in the world. While a typical large telescope contains a single CCD chip, the SDSS camera contains an array of thirty 4-megapixel chips.
Every two weeks, SDSS FedExes the raw imaging data to the U.S. Department of Energy's Fermi National Accelerator Laboratory in Batavia, Illinois for processing. There it is analyzed, calibrated, put into ASCII CSV (comma separated values) format and shipped to SDSS to add to its database. Using a database is a change from the usual way astronomical data is managed.
"The whole idea of putting them into databases is first of all to ensure the integrity of the data, be able to back out changes and things like that," Thakar explains. "The other big thing is to provide fast access to the data."
Page 2: Moving past FITS