LG MLFeb 5, 2024

Boosting, Voting Classifiers and Randomized Sample Compression Schemes

Arthur da Cunha, Kasper Green Larsen, Martin Ritzert

arXiv:2402.02976v26.43 citationsh-index: 13ALT

Originality Incremental advance

AI Analysis

This work addresses a long-standing theoretical bottleneck in machine learning for researchers and practitioners using boosting methods, representing a significant but incremental improvement over prior bounds.

The paper tackles the problem of sub-optimal theoretical bounds in boosting algorithms, such as AdaBoost, by proposing a randomized boosting algorithm that reduces the generalization error to a single logarithmic dependency on sample size, achieving a more efficient learning framework.

In boosting, we aim to leverage multiple weak learners to produce a strong learner. At the center of this paradigm lies the concept of building the strong learner as a voting classifier, which outputs a weighted majority vote of the weak learners. While many successful boosting algorithms, such as the iconic AdaBoost, produce voting classifiers, their theoretical performance has long remained sub-optimal: The best known bounds on the number of training examples necessary for a voting classifier to obtain a given accuracy has so far always contained at least two logarithmic factors above what is known to be achievable by general weak-to-strong learners. In this work, we break this barrier by proposing a randomized boosting algorithm that outputs voting classifiers whose generalization error contains a single logarithmic dependency on the sample size. We obtain this result by building a general framework that extends sample compression methods to support randomized learning algorithms based on sub-sampling.

View on arXiv PDF

Similar