|   | 
				
					
	
		  | 
	 
	
		| Paper: | 
		Exorcising the Ghost in the Machine: Synthetic Spectral Data Cubes for Assessing Big Data Algorithms | 
	 
	
		| Volume: | 
		495, Astronomical Data Analysis Software and Systems XXIV (ADASS XXIV) | 
	 
	
		| Page: | 
		57 | 
	 
	
		| Authors: | 
		Araya, M.; Solar, M.; Mardones, D.; Hochfärber, T. | 
	 
	
	
		| Abstract: | 
		The size and quantity of the data that is being generated by large astronomical
 projects like ALMA, requires a paradigm change in astronomical data analysis.
 Complex data, such as highly sensitive spectroscopic data in the form of large
 data cubes, are not only difficult to manage, transfer and visualize, but they
 make traditional data analysis techniques unfeasible. Consequently, the attention has been placed on machine learning and
 artificial intelligence techniques, to develop approximate and adaptive methods
 for astronomical data analysis within a reasonable computational time.
 Unfortunately, these techniques are usually sub optimal, stochastic and strongly
 dependent of the parameters, which could easily turn into “a ghost in the
 machine” for astronomers and practitioners. Therefore, a proper assessment of
 these methods is not only desirable but mandatory for trusting them in
 large-scale usage. The problem is that positively verifiable results are scarce
 in astronomy, and moreover, science using bleeding-edge instrumentation
 naturally lacks of reference values. We propose an Astronomical SYnthetic Data
 Observations (ASYDO), a virtual service that generates synthetic spectroscopic
 data in the form of data cubes. The objective of the tool is not to produce
 accurate astrophysical simulations, but to generate a large number of labelled
 synthetic data, to assess advanced computing algorithms for astronomy and to
 develop novel Big Data algorithms. The synthetic data is generated using a set
 of spectral lines, template functions for spatial and spectral distributions,
 and simple models that produce reasonable synthetic observations. Emission lines
 are obtained automatically using IVOA's SLAP protocol (or from a relational
 database) and their spectral profiles correspond to distributions in the
 exponential family. The spatial distributions correspond to simple functions
 (e.g., 2D Gaussian), or to scalable template objects. The intensity, broadening
 and radial velocity of each line is given by very simple and naive physical
 models, yet ASYDO's generic implementation supports new user-made models, which
 potentially allows adding more realistic simulations. The resulting data cube is
 saved as a FITS file, also including all the tables and images used for
 generating the cube. We expect to implement ASYDO as a virtual observatory
 service in the near future. | 
	 
	
		| 
			
			
		 | 
	 
	
		  | 
	 
 
					 
				 | 
				  |