Back to Volume
Paper: Improving Astronomical Online Services With Apache Spark and Docker
Volume: 521, Astronomical Data Analysis Software and Systems XXVI
Page: 592
Authors: Schaaff, A.; Pineau, F.; Wali, N.; Trehiou, P.; Nauroy, J.
Abstract: To face the increasing volume of data we will have to manage in the coming years, we are testing and prototyping implementations in the Big Data domain (both data and processing). The CDS provides a X-Match service which does a cross correlation of sources between very large catalogues. This kind of treatment is potentially heavy and requires appropriate techniques (data structuring and computing algorithms) to ensure good performance and to enable its use in online services. Apache Spark seems very promising and we are improving the algorithms, by using this technology in a suitable technical environment and by testing it with large datasets. Compared to Hadoop, Spark is designed to use the memory as much as possible. We performed comparative tests with our X-Match service. In a first step we used an internal and limited test bed to learn and to gain the necessary experience to optimize the process. In a second step we did the tests with a rented external cluster of servers and at the end we reached an execution time better than the X-Match service. We detail this experiment and we show the corresponding metrics and we give also an idea of the cost.
Back to Volume