LGQMJan 3, 2024

AIRI: Predicting Retention Indices and their Uncertainties using Artificial Intelligence

arXiv:2401.01506v211 citationsh-index: 16J Chem Inf Model
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of efficient chemical structure identification for chemists and researchers by providing accurate RI predictions and uncertainty estimates, though it is incremental as it applies a known deep learning approach to a specific domain.

The authors tackled the laborious task of predicting Kováts Retention Indices (RI) for chemical identification by developing a deep neural network called AIRI, which achieved a mean absolute error of 15.1 and a 95th percentile absolute error of 46.5, and was applied to improve the NIST EI-MS spectral libraries. They also quantified prediction uncertainties using an ensemble method, resulting in a standard deviation of Z scores of 1.52 and a 95th percentile absolute Z score corresponding to a mean RI of 42.6.

The Kováts Retention index (RI) is a quantity measured using gas chromatography and commonly used in the identification of chemical structures. Creating libraries of observed RI values is a laborious task, so we explore the use of a deep neural network for predicting RI values from structure for standard semipolar columns. This network generated predictions with a mean absolute error of 15.1 and, in a quantification of the tail of the error distribution, a 95th percentile absolute error of 46.5. Because of the Artificial Intelligence Retention Indices (AIRI) network's accuracy, it was used to predict RI values for the NIST EI-MS spectral libraries. These RI values are used to improve chemical identification methods and the quality of the library. Estimating uncertainty is an important practical need when using prediction models. To quantify the uncertainty of our network for each individual prediction, we used the outputs of an ensemble of 8 networks to calculate a predicted standard deviation for each RI value prediction. This predicted standard deviation was corrected to follow the error between observed and predicted RI values. The Z scores using these predicted standard deviations had a standard deviation of 1.52 and a 95th percentile absolute Z score corresponding to a mean RI value of 42.6.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes