GPUs and Python: A Recipe for Lightning-Fast Data Pipelines


Paper:	GPUs and Python: A Recipe for Lightning-Fast Data Pipelines
Volume:	461, Astronomical Data Analysis Software and Systems XXI
Page:	53
Authors:	Warner, C.; Packham, C.; Eikenberry, S. S.; Gonzalez, A.
Abstract:	As arrays increase their pixel numbers and mosaics of arrays become more prevalent, the volume of data being produced per night is increasing rapidly. As we look forward to the LSST era, where 30TB of data per night will be produced, streamlined and rapid data reduction processes are becoming critical. Recent developments in the computer industry have led to the production of Graphics Processing Units (GPUs) which can contain hundreds of processing cores, each of which can process hundreds of threads concurrently. Nvidia's Compute Unified Device Architecture (CUDA) platform has allowed developers to take advantage of these modern GPUs and design massively parallel algorithms which can provide huge speed-ups of up to around a factor of 100 over CPU implementations. Data pipelines are perfectly suited to reap the benefits of massive parallelization because many of the algorithms in data processing are performed on a per-pixel basis on ever larger sets of images. In addition, the PyCUDA (http://mathema.tician.de/software/pycuda) module and python native C-API allow for CUDA code to be easily integrated into python code. Python has continued to gain momentum in the astronomical community, particularly as an attractive alternative to IDL or C code for data pipelines. Thus, the ability to link GPU-optimized CUDA code directly into python allows for existing data pipeline frameworks to be reused with new parallel algorithms. We present the initial results of parallelizing many of the more CPU-intensive algorithms in the Florida Analysis Tool Born Of Yearning for high quality scientific data (FATBOY) and discuss the implications for the future of data pipelines. We use an Nvidia 580 GTX GPU for our tests and find that the 580 GTX produces a speed-up of anywhere from a factor of around 10 up to a factor of 300 over CPU implementations for individual routines. We believe that it is possible to obtain an overall pipeline speed gain of a factor of 10-25 over traditionally built data pipelines. A speed gain of this magnitude would for the first time allow for near real-time data processing – data sets that previously required hours to process could be reduced in minutes concurrent with continuing observations, considerably optimizing the observational process.