Back to Volume
Paper: A Framework for Analyzing Massive Astrophysical Datasets on a Distributed Grid
Volume: 376, Astronomical Data Analysis Software and Systems XVI
Page: 69
Authors: Gardner, J.P.; Connolly, A.; McBride, C.
Abstract: Virtual observatories will give astronomers easy access to an unprecedented amount of data. Extracting scientific knowledge from these data will increasingly demand both efficient algorithms as well as the power of parallel computers. Such machines will range in size from small Beowulf clusters to large, massively parallel platforms (MPPs) to collections of MPPs distributed across a Grid, such as the NSF TeraGrid facility. Nearly all efficient analyses of large astronomical datasets use trees as their fundamental data structure. Writing efficient tree-based techniques, a task that is time-consuming even on singleprocessor computers, is exceedingly cumbersome on parallel or grid-distributed resources. We have developed a framework, Ntropy, that provides a flexible, extensible, and easy-to-use way of developing tree-based data analysis algorithms for both serial and parallel platforms. Our experience has shown that not only does our framework save development time, it also delivers an increase in serial performance. Furthermore, our framework makes it easy for an astronomer with little or no parallel programming experience to scale their application quickly to a distributed, multi-processor environment. By minimizing development time for efficient and scalable data analysis, we will enable wide-scale knowledge discovery on massive datasets.
Back to Volume