MLLGOct 11, 2018

Online Multiclass Boosting with Bandit Feedback

arXiv:1810.05290v27 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of limited feedback in online learning for multiclass classification, offering a solution that is incremental by extending existing full-information algorithms to the bandit setting.

The paper tackles online multiclass classification with bandit feedback, where the learner only knows if its prediction is correct, by proposing boosting algorithms that use an unbiased loss estimate to update weak learners, achieving asymptotic error bounds matching full-information methods but with higher sample complexity.

We present online boosting algorithms for multiclass classification with bandit feedback, where the learner only receives feedback about the correctness of its prediction. We propose an unbiased estimate of the loss using a randomized prediction, allowing the model to update its weak learners with limited information. Using the unbiased estimate, we extend two full information boosting algorithms (Jung et al., 2017) to the bandit setting. We prove that the asymptotic error bounds of the bandit algorithms exactly match their full information counterparts. The cost of restricted feedback is reflected in the larger sample complexity. Experimental results also support our theoretical findings, and performance of the proposed models is comparable to that of an existing bandit boosting algorithm, which is limited to use binary weak learners.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes