Trading-Off Payments and Accuracy in Online Classification with Paid Stochastic Experts
This work addresses the challenge of optimizing cost in online learning with paid experts, which is incremental as it builds on existing bandit and classification methods.
The paper tackles the problem of online classification with paid stochastic experts, where payment influences prediction accuracy via an unknown Lipschitz function, and the learner must balance payments and errors. It introduces an algorithm that achieves a total cost exceeding that of an optimal predictor with prior knowledge by at most O(K^2(log T)√T), improving upon a T^{2/3} bound from standard Lipschitz bandits.
We investigate online classification with paid stochastic experts. Here, before making their prediction, each expert must be paid. The amount that we pay each expert directly influences the accuracy of their prediction through some unknown Lipschitz "productivity" function. In each round, the learner must decide how much to pay each expert and then make a prediction. They incur a cost equal to a weighted sum of the prediction error and upfront payments for all experts. We introduce an online learning algorithm whose total cost after $T$ rounds exceeds that of a predictor which knows the productivity of all experts in advance by at most $\mathcal{O}(K^2(\log T)\sqrt{T})$ where $K$ is the number of experts. In order to achieve this result, we combine Lipschitz bandits and online classification with surrogate losses. These tools allow us to improve upon the bound of order $T^{2/3}$ one would obtain in the standard Lipschitz bandit setting. Our algorithm is empirically evaluated on synthetic data