LGGTJan 27

Calibration without Ground Truth

arXiv:2601.19862v1
Originality Highly original
AI Analysis

This addresses the challenge of model improvement as human text data becomes scarce, offering a practical solution for AI developers and researchers.

The paper tackles the problem of improving model calibration without access to ground-truth labels by proposing a label-free post-processing framework that uses a weaker but better-calibrated reference model, achieving competitive performance with supervised baselines in experiments on LLMs.

Villalobos et al. [2024] predict that publicly available human text will be exhausted within the next decade. Thus, improving models without access to ground-truth labels becomes increasingly important. We propose a label-free post-processing framework that improves a strong but miscalibrated model using a weaker yet better-calibrated reference. Our framework guarantees a strict performance improvement under any proper loss. Our approach is based on a characterization of when strict improvement is possible: when the strong and reference models are not mutually calibrated. We formalize this condition, connect it to arbitrage and no-trade results from economics, and develop an efficient Bregman projection algorithm that guarantees worst-case loss reduction without labels. Experiments on representative LLMs across varying scales demonstrate that our label-free method significantly reduces proper losses and calibration errors, achieving performance competitive with supervised baselines.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes