LGMay 17, 2021

Multiclass Classification using dilute bandit feedback

arXiv:2105.08093v13.11 citations

Originality Incremental advance

AI Analysis

This addresses online learning challenges with limited supervision for applications like recommendation systems, though it is incremental as it builds on bandit feedback settings.

The paper tackles multiclass classification with diluted bandit feedback, where algorithms predict candidate label sets and receive binary feedback on whether the true label is included, achieving an O(T^{1-1/(m+2)}) mistake bound for set size m.

This paper introduces a new online learning framework for multiclass classification called learning with diluted bandit feedback. At every time step, the algorithm predicts a candidate label set instead of a single label for the observed example. It then receives feedback from the environment whether the actual label lies in this candidate label set or not. This feedback is called "diluted bandit feedback". Learning in this setting is even more challenging than the bandit feedback setting, as there is more uncertainty in the supervision. We propose an algorithm for multiclass classification using dilute bandit feedback (MC-DBF), which uses the exploration-exploitation strategy to predict the candidate set in each trial. We show that the proposed algorithm achieves O(T^{1-\frac{1}{m+2}}) mistake bound if candidate label set size (in each step) is m. We demonstrate the effectiveness of the proposed approach with extensive simulations.

View on arXiv PDF

Similar