LGDec 23, 2025

Improving ML Training Data with Gold-Standard Quality Metrics

arXiv:2512.20577v12 citationsh-index: 1
Originality Incremental advance
AI Analysis

This addresses data quality control for machine learning practitioners, but it is incremental as it builds on existing statistical approaches.

The paper tackles the problem of inconsistent quality in hand-tagged training data by proposing statistical methods to measure tagging consistency and agreement, showing that agreement metrics improve with multiple iterations and that high-quality data can be collected without multiple tags per item.

Hand-tagged training data is essential to many machine learning tasks. However, training data quality control has received little attention in the literature, despite data quality varying considerably with the tagging exercise. We propose methods to evaluate and enhance the quality of hand-tagged training data using statistical approaches to measure tagging consistency and agreement. We show that agreement metrics give more reliable results if recorded over multiple iterations of tagging, where declining variance in such recordings is an indicator of increasing data quality. We also show one way a tagging project can collect high-quality training data without requiring multiple tags for every work item, and that a tagger burn-in period may not be sufficient for minimizing tagger errors.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes