Locally Adaptive Label Smoothing for Predictive Churn
This work tackles the practical problem of prediction churn in neural networks, which is undesirable for practitioners relying on consistent model behavior.
The paper addresses the problem of prediction churn in neural networks, where re-trainings of the same model yield different predictions despite similar accuracies. The authors propose a method using locally adaptive label smoothing, which smooths each example's label based on its neighbors, and demonstrate that it reduces churn while improving accuracy on various classification tasks.
Training modern neural networks is an inherently noisy process that can lead to high \emph{prediction churn} -- disagreements between re-trainings of the same model due to factors such as randomization in the parameter initialization and mini-batches -- even when the trained models all attain similar accuracies. Such prediction churn can be very undesirable in practice. In this paper, we present several baselines for reducing churn and show that training on soft labels obtained by adaptively smoothing each example's label based on the example's neighboring labels often outperforms the baselines on churn while improving accuracy on a variety of benchmark classification tasks and model architectures.