LGAug 2, 2023

When Analytic Calculus Cracks AdaBoost Code

arXiv:2308.01070v2h-index: 16
Originality Synthesis-oriented
AI Analysis

This is an incremental analysis exposing discrepancies in a widely used machine learning library, relevant for practitioners and researchers in supervised learning.

The study reveals that AdaBoost's classifier combination can be derived analytically via a truth table, showing it does not minimize risk and that scikit-learn's implementation deviates from the original algorithm.

The principle of boosting in supervised learning involves combining multiple weak classifiers to obtain a stronger classifier. AdaBoost has the reputation to be a perfect example of this approach. This study analyzes the (two classes) AdaBoost procedure implemented in scikit-learn. This paper shows that AdaBoost is an algorithm in name only, as the resulting combination of weak classifiers can be explicitly calculated using a truth table. Indeed, using a logical analysis of the training set with weak classifiers constructing a truth table, we recover, through an analytical formula, the weights of the combination of these weak classifiers obtained by the procedure. We observe that this formula does not give the point of minimum of the risk, we provide a system to compute the exact point of minimum and we check that the AdaBoost procedure in scikit-learn does not implement the algorithm described by Freund and Schapire.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes