LGMay 21, 2024

Can We Treat Noisy Labels as Accurate?

Yuxiang Zheng, Zhongyi Han, Yilong Yin, Xin Gao, Tongliang Liu

arXiv:2405.12969v24.62 citationsh-index: 8Has Code

Originality Highly original

AI Analysis

This addresses the challenge of noisy labels for machine learning practitioners, particularly in high-noise environments, representing a novel approach rather than an incremental improvement.

The paper tackles the problem of noisy labels in machine learning by proposing EchoAlign, a paradigm shift that treats noisy labels as accurate and modifies instances to align with them, achieving superior accuracy and robustness, with EchoSelect retaining nearly twice the number of correctly labeled samples under 30% instance-dependent noise while maintaining 99% selection accuracy.

Noisy labels significantly hinder the accuracy and generalization of machine learning models, particularly when resulting from ambiguous instance features that complicate correct labeling. Traditional approaches, such as those relying on transition matrices for label correction, often struggle to effectively resolve such ambiguity, due to their inability to capture complex relationships between instances and noisy labels. In this paper, we propose EchoAlign, a paradigm shift in learning from noisy labels. Unlike previous methods that attempt to correct labels, EchoAlign treats noisy labels ($\tilde{Y}$) as accurate and modifies corresponding instances ($X$) to better align with these labels. The EchoAlign framework comprises two main components: (1) EchoMod leverages controllable generative models to selectively modify instance features, achieving alignment with noisy labels while preserving intrinsic instance characteristics such as shape, texture, and semantic identity. (2) EchoSelect mitigates distribution shifts introduced by instance modifications by strategically retaining a substantial subset of original instances with correct labels. Specifically, EchoSelect exploits feature similarity distributions between original and modified instances to accurately distinguish between correctly and incorrectly labeled samples. Extensive experiments across three benchmark datasets demonstrate that EchoAlign significantly outperforms state-of-the-art methods, particularly in high-noise environments, achieving superior accuracy and robustness. Notably, under 30% instance-dependent noise, EchoSelect retains nearly twice the number of correctly labeled samples compared to previous methods, maintaining 99% selection accuracy, thereby clearly illustrating the effectiveness of EchoAlign. The implementation of EchoAlign is publicly available at https://github.com/KevinCarpricorn/EchoAlign/tree/main.

View on arXiv PDF Code

Similar