|   | 
				
					
	
		  | 
	 
	
		| Paper: | 
		Producing an Infrared Multiwavelength Galactic Plane Atlas Using   Montage, Pegasus, and Amazon Web Services | 
	 
	
		| Volume: | 
		485, Astronomical Data Analysis Software and Systems XXIII | 
	 
	
		| Page: | 
		211 | 
	 
	
		| Authors: | 
		Rynge, M.; Juve, G.; Kinney, J.; Good, J.; Berriman, B.; Merrihew, A.; Deelman, E. | 
	 
	
	
		| Abstract: | 
		In this paper, we describe how to leverage cloud resources to generate
 large-scale mosaics of the galactic plane in multiple wavelengths. Our
 goal is to generate a 16-wavelength infrared Atlas of the Galactic Plane
 at a common spatial sampling of 1 arcsec, processed so that they appear
 to have been measured with a single instrument. This will be achieved by
 using the Montage image mosaic engine process observations from the 2MASS,  GLIMPSE,  MIPSGAL, MSX and WISE datasets, over a wavelength range
 of 1 μm to 24 μm, and by using the Pegasus Workflow Management
 System for managing the workload. When complete, the Atlas will be made
 available to the community as a data product.
 
 We are generating images that cover ±180° in Galactic longitude
 and ±20° in Galactic latitude, to the extent permitted by the
 spatial coverage of each dataset. Each image will be 5°x5° in
 size (including an overlap of 1° with neighboring tiles), resulting
 in an atlas of 1,001 images. The final size will be about 50 TBs.
 
 This paper will focus on the computational challenges, solutions,
 and lessons learned in producing the Atlas. To manage the computation we
 are using the Pegasus Workflow Management System, a mature, highly
 fault-tolerant system now in release 4.2.2 that has found wide
 applicability across many science disciplines. A scientific workflow
 describes the dependencies between the tasks and in most cases the
 workflow is described as a directed acyclic graph, where the nodes are
 tasks and the edges denote the task dependencies. A defining property
 for a scientific workflow is that it manages data flow between
 tasks. Applied to the galactic plane project, each 5 by 5 mosaic is a
 Pegasus workflow. Pegasus is used to fetch the source images, execute
 the image mosaicking steps of Montage, and store the final outputs in a
 storage system.
 
 As these workflows are very I/O intensive, care has to be taken when
 choosing what infrastructure to execute the workflow on. In our setup,
 we choose to use dynamically provisioned compute clusters running on the
 Amazon Elastic Compute Cloud (EC2). All our instances are using the same
 base image, which is configured to come up as a master node by
 default. The master node is a central instance from where the workflow
 can be managed.  Additional worker instances are provisioned and
 configured to accept work assignments from the master node. The system
 allows for adding/removing workers in an ad hoc fashion, and could be
 run in large configurations.
 
 To-date we have performed 245,000 CPU hours of computing and generated
 7,029 images and   totaling 30 TB. With the current set up our
 runtime would be 340,000 CPU hours for the whole project. Using spot
 m2.4xlarge instances, the cost would be approximately $5,950. Using
 faster AWS instances, such as cc2.8xlarge could potentially decrease the
 total CPU hours and further reduce the compute costs. The paper will
 explore these tradeoffs. | 
	 
	
		| 
			
			
		 | 
	 
	
		  | 
	 
 
					 
				 | 
				  |