Online Ensemble Learning for Imbalanced Data Streams
This work addresses the problem of handling imbalanced data in streaming environments for machine learning practitioners, representing an incremental advancement by bridging two existing research areas.
The paper tackles the challenge of online learning with imbalanced data streams by proposing a novel framework that fuses online ensemble algorithms with cost-sensitive bagging/boosting methods, resulting in theoretically sound algorithms with guaranteed convergence and validated effectiveness on benchmark datasets.
While both cost-sensitive learning and online learning have been studied extensively, the effort in simultaneously dealing with these two issues is limited. Aiming at this challenge task, a novel learning framework is proposed in this paper. The key idea is based on the fusion of online ensemble algorithms and the state of the art batch mode cost-sensitive bagging/boosting algorithms. Within this framework, two separately developed research areas are bridged together, and a batch of theoretically sound online cost-sensitive bagging and online cost-sensitive boosting algorithms are first proposed. Unlike other online cost-sensitive learning algorithms lacking theoretical analysis of asymptotic properties, the convergence of the proposed algorithms is guaranteed under certain conditions, and the experimental evidence with benchmark data sets also validates the effectiveness and efficiency of the proposed methods.