Theoretical Guarantees of Deep Embedding Losses Under Label Noise
This work addresses label noise issues in deep learning for tasks with costly or impractical data labeling, offering incremental theoretical insights.
The paper tackles the problem of deep embedding learning with unreliable labels by providing theoretical guarantees for marginal and triplet losses under label noise, showing how sampling strategies and initialization affect noise resistance.
Collecting labeled data to train deep neural networks is costly and even impractical for many tasks. Thus, research effort has been focused in automatically curated datasets or unsupervised and weakly supervised learning. The common problem in these directions is learning with unreliable label information. In this paper, we address the tolerance of deep embedding learning losses against label noise, i.e. when the observed labels are different from the true labels. Specifically, we provide the sufficient conditions to achieve theoretical guarantees for the 2 common loss functions: marginal loss and triplet loss. From these theoretical results, we can estimate how sampling strategies and initialization can affect the level of resistance against label noise. The analysis also helps providing more effective guidelines in unsupervised and weakly supervised deep embedding learning.