CV AI LGMar 29, 2022

Online Continual Learning on a Contaminated Data Stream with Blurry Task Boundaries

Jihwan Bang, Hyunseo Koh, Seulki Park, Hwanjun Song, Jung-Woo Ha, Jonghyun Choi

arXiv:2203.15355v219.358 citationsh-index: 32Has Code

Originality Incremental advance

AI Analysis

This addresses a practical real-world challenge for machine learning systems that must adapt to noisy, evolving data streams, representing an incremental improvement over existing continual learning approaches.

The paper tackles the problem of online continual learning with corrupted labels and blurry task boundaries, proposing a method that balances diversity and purity in episodic memory, which significantly outperforms prior methods on datasets like CIFAR10, CIFAR100, mini-WebVision, and Food-101N.

Learning under a continuously changing data distribution with incorrect labels is a desirable real-world problem yet challenging. A large body of continual learning (CL) methods, however, assumes data streams with clean labels, and online learning scenarios under noisy data streams are yet underexplored. We consider a more practical CL task setup of an online learning from blurry data stream with corrupted labels, where existing CL methods struggle. To address the task, we first argue the importance of both diversity and purity of examples in the episodic memory of continual learning models. To balance diversity and purity in the episodic memory, we propose a novel strategy to manage and use the memory by a unified approach of label noise aware diverse sampling and robust learning with semi-supervised learning. Our empirical validations on four real-world or synthetic noise datasets (CIFAR10 and 100, mini-WebVision, and Food-101N) exhibit that our method significantly outperforms prior arts in this realistic and challenging continual learning scenario. Code and data splits are available in https://github.com/clovaai/puridiver.

View on arXiv PDF Code

Similar