Dimensionality-Driven Learning with Noisy Labels
This addresses the challenge of noisy labels in datasets for machine learning practitioners, offering a novel method to improve robustness in training.
The paper tackles the problem of training deep neural networks on datasets with noisy labels by analyzing the dimensionality of representation subspaces, and demonstrates that their dimensionality-driven learning strategy achieves high tolerance to significant proportions of noisy labels.
Datasets with significant proportions of noisy (incorrect) class labels present challenges for training accurate Deep Neural Networks (DNNs). We propose a new perspective for understanding DNN generalization for such datasets, by investigating the dimensionality of the deep representation subspace of training samples. We show that from a dimensionality perspective, DNNs exhibit quite distinctive learning styles when trained with clean labels versus when trained with a proportion of noisy labels. Based on this finding, we develop a new dimensionality-driven learning strategy, which monitors the dimensionality of subspaces during training and adapts the loss function accordingly. We empirically demonstrate that our approach is highly tolerant to significant proportions of noisy labels, and can effectively learn low-dimensional local subspaces that capture the data distribution.