Imbalanced Continual Learning with Partitioning Reservoir Sampling
This work addresses a crucial challenge for machine learning systems that need to learn sequentially from imbalanced multi-label data, though it is incremental as it builds on replay-based approaches.
The paper tackles the problem of continual learning in multi-label classification with long-tailed label distributions, identifying destructive forgetting of minority concepts and proposing Partitioning Reservoir Sampling (PRS) to maintain balanced knowledge, achieving competitive performance on curated benchmarks like COCOseq and NUS-WIDEseq.
Continual learning from a sequential stream of data is a crucial challenge for machine learning research. Most studies have been conducted on this topic under the single-label classification setting along with an assumption of balanced label distribution. This work expands this research horizon towards multi-label classification. In doing so, we identify unanticipated adversity innately existent in many multi-label datasets, the long-tailed distribution. We jointly address the two independently solved problems, Catastropic Forgetting and the long-tailed label distribution by first empirically showing a new challenge of destructive forgetting of the minority concepts on the tail. Then, we curate two benchmark datasets, COCOseq and NUS-WIDEseq, that allow the study of both intra- and inter-task imbalances. Lastly, we propose a new sampling strategy for replay-based approach named Partitioning Reservoir Sampling (PRS), which allows the model to maintain a balanced knowledge of both head and tail classes. We publicly release the dataset and the code in our project page.