Abstract: |
The NASA Astrophysics Data System (ADS) is the primary Digital Library portal for researchers in astronomy and astrophysics. Over the past 30 years,
the ADS has gone from being an astronomy-focused bibliographic database to an open
digital library system supporting research in space and (soon) earth sciences. This paper describes the evolution of the ADS system, its capabilities, and the technological
infrastructure underpinning it.
We give an overview of the ADS’s original architecture, constructed primarily
around simple database models. This bespoke system allowed for the efficient indexing
of metadata and citations, the digitization and archival of full-text articles, and the rapid
development of discipline-specific capabilities running on commodity hardware. The
move towards a cloud-based microservices architecture and an open-source search engine in the late 2010s marked a significant shift, bringing full-text search capabilities,
a modern API, higher uptime, more reliable data retrieval, and integration of advanced
visualizations and analytics.
Another crucial evolution came with the gradual and ongoing incorporation of
Machine Learning and Natural Language Processing algorithms in our data pipelines.
Originally used for information extraction and classification tasks, NLP and ML techniques are now being developed to improve metadata enrichment, search, notifications,
and recommendations. we describe how these computational techniques are being embedded into our software infrastructure, the challenges faced, and the benefits reaped.
Finally, we conclude by describing the future prospects of ADS and its ongoing
expansion, discussing the challenges of managing an interdisciplinary information system in the era of AI and Open Science, where information is abundant, technology is
transformative, but their trustworthiness can be elusive. |