Machine Learning Bias and the Annotation of Large Databases of Astronomical Objects


Paper:	Machine Learning Bias and the Annotation of Large Databases of Astronomical Objects
Volume:	538, ADASS XXXII
Page:	162
Authors:	Hunter Goddard; Lior Shamir
DOI:	10.26624/BFMV6514
Abstract:	One of the common approaches to annotating astronomical databases is by applying machine learning (ML), and specifically artificial neural networks (ANNs). But while ANNs can be invaluable for astronomy, they also have several downsides. Here, we study the possible disadvantages of ANNs. Our results show that when using ML, the annotations can have subtle but consistent biases. These biases are very difficult to detect, can change in different parts of the sky, and are not intuitive for the users of data products annotated by ANNs. Since these catalogs are, in many cases, very large, these subtle biases can lead to statistically significant observations that are the result of the neural network bias rather than a true reflection of the Universe. Based on these observations, catalogs annotated by current ANNs should be used cautiously, and statistical observations enabled by such catalogs should be analyzed in light of possible biases in the machine learning systems. The results reinforce the need for further research on explainable neural network architectures.