|   | 
				
					
	
		  | 
	 
	
		| Paper: | 
		Content-Aware Data Discovery on VO Catalogs Using   Succinct Representations | 
	 
	
		| Volume: | 
		527, Astronomical Data Analysis Software and Systems XXIX | 
	 
	
		| Page: | 
		13 | 
	 
	
		| Authors: | 
		Araya, M.; Arroyuelo, D.; Saldías, C.; Solar, M. | 
	 
	
	
		| Abstract: | 
		VO-services and online astronomical archives in general allow to discover data
 resources based on the metadata that each resource provides. Content-aware data
 discovery is the process of searching for patterns within the content of the
 resources, for example over the values of astronomical catalogs, and returning
 how many matches each resource produces. While a combination of existing
 protocols and services might produce this result, scaling up to a large number
 of resources while maintaining reasonable query speeds is a challenging problem.
 We propose using succinct representations to produce compressed intermediate
 files where these queries can be performed with low computational complexity. In
 particular, we focus on tabular data resources (i.e. catalogs), where a
 content-aware query can be casted as an attribute-retrieval problem. We show
 that these intermediate files can be computed directly from VOTable results from
 TAP services, so a succinct (and compressed) representation of any catalog
 available over this standard can be obtained. We compare our results with
 standard SQL queries over a popular DBMS, showing that for most of the queries
 our approach outperforms the state of the art. | 
	 
	
		| 
			
			
		 | 
	 
	
		  | 
	 
 
					 
				 | 
				  |