Towards Label Imbalance in Multi-label Classification with Many Labels
This addresses a critical issue for researchers and practitioners in multi-label classification, though it is incremental as it builds on existing scalable methods.
The paper tackles the label imbalance problem in multi-label classification with many labels, which existing scalable algorithms ignore or worsen, by proposing a novel Representation-based Multi-label Learning with Sampling (RMLS) approach and demonstrating its effectiveness on real-world datasets.
In multi-label classification, an instance may be associated with a set of labels simultaneously. Recently, the research on multi-label classification has largely shifted its focus to the other end of the spectrum where the number of labels is assumed to be extremely large. The existing works focus on how to design scalable algorithms that offer fast training procedures and have a small memory footprint. However they ignore and even compound another challenge - the label imbalance problem. To address this drawback, we propose a novel Representation-based Multi-label Learning with Sampling (RMLS) approach. To the best of our knowledge, we are the first to tackle the imbalance problem in multi-label classification with many labels. Our experimentations with real-world datasets demonstrate the effectiveness of the proposed approach.