Back to Volume
Paper: Cloud Based Processing of Large Photometric Surveys
Volume: 475, Astronomical Data Analysis Software and Systems XXII
Page: 91
Authors: Farivar, R.; Brunner, R. J.; Santucci, R.; Campbell, R.
Abstract: Astronomy, as is the case with many scientific domains, has entered the realm of being a data rich science. Nowhere is this reflected more clearly than in the growth of large area surveys, such as the recently completed Sloan Digital Sky Survey (SDSS) or the Dark Energy Survey, which will soon obtain PB of imaging data. The data processing on these large surveys is a major challenge. In this paper, we demonstrate a new approach to this common problem. We propose the use of cloud-based technologies (e.g., Hadoop MapReduce) to run a data analysis program (e.g., SExtractor) across a cluster. Using the intermediate key/value pair design of Hadoop, our framework matches objects across different SExtractor invocations to create a unified catalog from all SDSS processed data. We conclude by presenting our experimental results on a 432 core cluster and discuss the lessons we have learned in completing this challenge.
Back to Volume