LG MLMar 11

When should we trust the annotation? Selective prediction for molecular structure retrieval from mass spectra

Mira Jürgens, Gaetan De Waele, Morteza Rakhshaninejad, Willem Waegeman

arXiv:2603.10950v16.01 citationsh-index: 37

Predicted impact top 63% in LG · last 90 daysOriginality Incremental advance

AI Analysis

This work addresses the critical need for reliable predictions in high-stakes applications like clinical metabolomics and environmental screening, where incorrect annotations can have serious consequences, though it is incremental as it builds on existing uncertainty quantification methods.

The paper tackles the problem of high error rates in machine learning methods for identifying molecular structures from mass spectra by introducing a selective prediction framework that allows models to abstain when uncertainty is too high, demonstrating that computationally inexpensive confidence measures can achieve strong risk-coverage tradeoffs and enable practitioners to specify tolerable error rates with high probability.

Machine learning methods for identifying molecular structures from tandem mass spectra (MS/MS) have advanced rapidly, yet current approaches still exhibit significant error rates. In high-stakes applications such as clinical metabolomics and environmental screening, incorrect annotations can have serious consequences, making it essential to determine when a prediction can be trusted. We introduce a selective prediction framework for molecular structure retrieval from MS/MS spectra, enabling models to abstain from predictions when uncertainty is too high. We formulate the problem within the risk-coverage tradeoff framework and comprehensively evaluate uncertainty quantification strategies at two levels of granularity: fingerprint-level uncertainty over predicted molecular fingerprint bits, and retrieval-level uncertainty over candidate rankings. We compare scoring functions including first-order confidence measures, aleatoric and epistemic uncertainty estimates from second-order distributions, as well as distance-based measures in the latent space. All experiments are conducted on the MassSpecGym benchmark. Our analysis reveals that while fingerprint-level uncertainty scores are poor proxies for retrieval success, computationally inexpensive first-order confidence measures and retrieval-level aleatoric uncertainty achieve strong risk-coverage tradeoffs across evaluation settings. We demonstrate that by applying distribution-free risk control via generalization bounds, practitioners can specify a tolerable error rate and obtain a subset of annotations satisfying that constraint with high probability.

View on arXiv PDF

Similar