LGAICVMLSep 8, 2020

Imbalanced Continual Learning with Partitioning Reservoir Sampling

arXiv:2009.03632v1131 citations
Originality Incremental advance
AI Analysis

This work addresses a crucial challenge for machine learning systems that need to learn sequentially from imbalanced multi-label data, though it is incremental as it builds on replay-based approaches.

The paper tackles the problem of continual learning in multi-label classification with long-tailed label distributions, identifying destructive forgetting of minority concepts and proposing Partitioning Reservoir Sampling (PRS) to maintain balanced knowledge, achieving competitive performance on curated benchmarks like COCOseq and NUS-WIDEseq.

Continual learning from a sequential stream of data is a crucial challenge for machine learning research. Most studies have been conducted on this topic under the single-label classification setting along with an assumption of balanced label distribution. This work expands this research horizon towards multi-label classification. In doing so, we identify unanticipated adversity innately existent in many multi-label datasets, the long-tailed distribution. We jointly address the two independently solved problems, Catastropic Forgetting and the long-tailed label distribution by first empirically showing a new challenge of destructive forgetting of the minority concepts on the tail. Then, we curate two benchmark datasets, COCOseq and NUS-WIDEseq, that allow the study of both intra- and inter-task imbalances. Lastly, we propose a new sampling strategy for replay-based approach named Partitioning Reservoir Sampling (PRS), which allows the model to maintain a balanced knowledge of both head and tail classes. We publicly release the dataset and the code in our project page.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes