Making Classifier Chains Resilient to Class Imbalance
This addresses class imbalance in multi-label data, which is a common problem in machine learning applications, but the approach is incremental as it builds on existing ECC methods.
The paper tackled class imbalance in multi-label learning by enhancing Ensemble of Classifier Chains (ECC) with random undersampling and extensions to vary binary models and chain sizes, achieving effectiveness across 16 datasets in various metrics.
Class imbalance is an intrinsic characteristic of multi-label data. Most of the labels in multi-label data sets are associated with a small number of training examples, much smaller compared to the size of the data set. Class imbalance poses a key challenge that plagues most multi-label learning methods. Ensemble of Classifier Chains (ECC), one of the most prominent multi-label learning methods, is no exception to this rule, as each of the binary models it builds is trained from all positive and negative examples of a label. To make ECC resilient to class imbalance, we first couple it with random undersampling. We then present two extensions of this basic approach, where we build a varying number of binary models per label and construct chains of different sizes, in order to improve the exploitation of majority examples with approximately the same computational budget. Experimental results on 16 multi-label datasets demonstrate the effectiveness of the proposed approaches in a variety of evaluation metrics.