LG DATA-ANSep 30, 2022

New Metric Formulas that Include Measurement Errors in Machine Learning for Natural Sciences

arXiv:2209.15588v13.37 citationsh-index: 13

Originality Incremental advance

AI Analysis

This addresses a critical deficiency in physics and other fields where ignoring measurement errors can lead to false conclusions about theories or patterns.

The paper tackles the problem of neglecting measurement errors in machine learning performance metrics for natural sciences, deriving new formulas for regression and classification metrics that account for these errors, resulting in more pessimistic but statistically confident estimations.

The application of machine learning to physics problems is widely found in the scientific literature. Both regression and classification problems are addressed by a large array of techniques that involve learning algorithms. Unfortunately, the measurement errors of the data used to train machine learning models are almost always neglected. This leads to estimations of the performance of the models (and thus their generalisation power) that is too optimistic since it is always assumed that the target variables (what one wants to predict) are correct. In physics, this is a dramatic deficiency as it can lead to the belief that theories or patterns exist where, in reality, they do not. This paper addresses this deficiency by deriving formulas for commonly used metrics (both for regression and classification problems) that take into account measurement errors of target variables. The new formulas give an estimation of the metrics which is always more pessimistic than what is obtained with the classical ones, not taking into account measurement errors. The formulas given here are of general validity, completely model-independent, and can be applied without limitations. Thus, with statistical confidence, one can analyze the existence of relationships when dealing with measurements with errors of any kind. The formulas have wide applicability outside physics and can be used in all problems where measurement errors are relevant to the conclusions of studies.

View on arXiv PDF

Similar