LGAIDec 8, 2024

Towards Modeling Data Quality and Machine Learning Model Performance

arXiv:2412.05882v11 citationsh-index: 4
Originality Synthesis-oriented
AI Analysis

This work addresses the need for better trust and performance measurement in machine learning by modeling data quality, but it is incremental as it builds on existing concepts like SNR.

The paper tackled the problem of quantifying how data uncertainty and noise affect machine learning model performance by proposing a new metric called deterministic-non-deterministic ratio (DDR) based on signal-to-noise ratio, and demonstrated through synthetic data experiments that accuracy varies with DDR, enabling the use of DDR-accuracy curves to assess model performance.

Understanding the effect of uncertainty and noise in data on machine learning models (MLM) is crucial in developing trust and measuring performance. In this paper, a new model is proposed to quantify uncertainties and noise in data on MLMs. Using the concept of signal-to-noise ratio (SNR), a new metric called deterministic-non-deterministic ratio (DDR) is proposed to formulate performance of a model. Using synthetic data in experiments, we show how accuracy can change with DDR and how we can use DDR-accuracy curves to determine performance of a model.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes