LGMay 20

Same Target, Different Basins: Hard vs. Soft Labels for Annotator Distributions

arXiv:2605.2064216.2

Predicted impact top 85% in LG · last 90 daysOriginality Incremental advance

AI Analysis

For practitioners dealing with noisy labels from multiple annotators, this work provides practical guidance on when to use hard-label methods over soft-label training, especially in low-annotation regimes.

The paper investigates hard-label delivery methods (multipass and stochastic label sampling) for training with annotator disagreement, finding that they outperform soft-label training when few annotations per example are available, and match it when full distributions are available. On CIFAR-10H, hard-label methods yield larger improvements when the sparse empirical target is farther from the full distribution.

When annotators disagree, that disagreement can reflect epistemic uncertainty rather than simple label noise. We study hard-label delivery as an alternative to the usual choices of collapsing votes to a single label or training directly on the empirical soft-label distribution. We focus on two primary hard-label methods: multipass, which cycles through observed votes while keeping the dataset size fixed, and stochastic label sampling (SLS), which samples one label per example at the start of each epoch. On CIFAR-10H, we find that when only a small number of annotations per example is available, hard-label delivery improves over soft-label training, with larger improvements where the sparse empirical target is farther from the full annotator distribution. When full annotator distributions are available, both hard-label methods match soft-label training. We use deterministic control as an ablation of multipass and shuffled SLS as a control that breaks the example-to-distribution match. We also show that SLS and soft-label cross-entropy optimize the same expected objective. Hard-label delivery also converges to flatter basins, with supporting descriptive evidence from OOD detection on SVHN and CIFAR-100. Overall, these results suggest that multipass is a strong practical default when raw vote counts are available, while SLS offers a lightweight alternative that remains competitive when only a few votes per example are available and matches soft-label training when full annotator distributions are available.

View on arXiv PDF

Similar