amc: The Automated Mission Classifier for Telescope Bibliographies
This tool addresses the scalability problem for astronomers and librarians in managing telescope bibliographies, though it is incremental as it applies existing LLM methods to a new domain.
The authors tackled the challenge of manually labeling telescope references in astronomical literature by developing the Automated Mission Classifier (amc), which uses large language models to automate this process, achieving a macro F1 score of 0.84 on a test set.
Telescope bibliographies record the pulse of astronomy research by capturing publication statistics and citation metrics for telescope facilities. Robust and scalable bibliographies ensure that we can measure the scientific impact of our facilities and archives. However, the growing rate of publications threatens to outpace our ability to manually label astronomical literature. We therefore present the Automated Mission Classifier (amc), a tool that uses large language models (LLMs) to identify and categorize telescope references by processing large quantities of paper text. A modified version of amc performs well on the TRACS Kaggle challenge, achieving a macro $F_1$ score of 0.84 on the held-out test set. amc is valuable for other telescopes beyond TRACS; we developed the initial software for identifying papers that featured scientific results by NASA missions. Additionally, we investigate how amc can also be used to interrogate historical datasets and surface potential label errors. Our work demonstrates that LLM-based applications offer powerful and scalable assistance for library sciences.