LGCRMLJun 17, 2024

Retraining with Predicted Hard Labels Provably Increases Model Accuracy

arXiv:2406.11206v34 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of noisy label training, particularly for applications like local label differential privacy, by providing a theoretical foundation and practical improvement, though it is incremental as it builds on existing retraining ideas.

The paper tackles the problem of training models with noisy labels by theoretically analyzing retraining with predicted hard labels, proving it can improve population accuracy in a linearly separable binary classification setting, and empirically showing that consensus-based retraining improves label differential privacy training, achieving over 6% accuracy gain on CIFAR-100 with ResNet-18 at ε=3.

The performance of a model trained with noisy labels is often improved by simply \textit{retraining} the model with its \textit{own predicted hard labels} (i.e., 1/0 labels). Yet, a detailed theoretical characterization of this phenomenon is lacking. In this paper, we theoretically analyze retraining in a linearly separable binary classification setting with randomly corrupted labels given to us and prove that retraining can improve the population accuracy obtained by initially training with the given (noisy) labels. To the best of our knowledge, this is the first such theoretical result. Retraining finds application in improving training with local label differential privacy (DP) which involves training with noisy labels. We empirically show that retraining selectively on the samples for which the predicted label matches the given label significantly improves label DP training at no extra privacy cost; we call this consensus-based retraining. As an example, when training ResNet-18 on CIFAR-100 with $ε=3$ label DP, we obtain more than 6% improvement in accuracy with consensus-based retraining.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes