LGSTAT-MECHAIMLJun 1, 2024

Stochastic Resetting Mitigates Latent Gradient Bias of SGD from Label Noise

arXiv:2406.00396v36 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of noisy labels in deep learning, which is a common issue in real-world datasets, by introducing a method that mitigates overfitting and enhances model robustness, though it is incremental as it builds on existing SGD and noisy label handling techniques.

The paper tackles the problem of training deep neural networks with noisy labels, where models initially learn general patterns but then overfit to corrupted data, and shows that applying stochastic resetting to SGD significantly improves generalization performance, with empirical validation confirming these gains.

Giving up and starting over may seem wasteful in many situations such as searching for a target or training deep neural networks (DNNs). Our study, though, demonstrates that resetting from a checkpoint can significantly improve generalization performance when training DNNs with noisy labels. In the presence of noisy labels, DNNs initially learn the general patterns of the data but then gradually memorize the corrupted data, leading to overfitting. By deconstructing the dynamics of stochastic gradient descent (SGD), we identify the behavior of a latent gradient bias induced by noisy labels, which harms generalization. To mitigate this negative effect, we apply the stochastic resetting method to SGD, inspired by recent developments in the field of statistical physics achieving efficient target searches. We first theoretically identify the conditions where resetting becomes beneficial, and then we empirically validate our theory, confirming the significant improvements achieved by resetting. We further demonstrate that our method is both easy to implement and compatible with other methods for handling noisy labels. Additionally, this work offers insights into the learning dynamics of DNNs from an interpretability perspective, expanding the potential to analyze training methods through the lens of statistical physics.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes