LGJul 23, 2024

Online Learning with Sublinear Best-Action Queries

arXiv:2407.16355v12 citationsh-index: 19
Originality Highly original
AI Analysis

This work addresses the challenge of efficient decision-making in online learning when predictive features are costly, offering theoretical guarantees for algorithms with sublinear query budgets.

The paper tackles the problem of online learning with limited access to best-action queries, establishing tight regret bounds for different feedback models. It shows that with sublinear queries, optimal regret of Θ(min{√T, T/k}) can be achieved in full feedback, and improves regret to Θ(min{T/√k, T²/k²}) in a challenging limited feedback setting.

In online learning, a decision maker repeatedly selects one of a set of actions, with the goal of minimizing the overall loss incurred. Following the recent line of research on algorithms endowed with additional predictive features, we revisit this problem by allowing the decision maker to acquire additional information on the actions to be selected. In particular, we study the power of \emph{best-action queries}, which reveal beforehand the identity of the best action at a given time step. In practice, predictive features may be expensive, so we allow the decision maker to issue at most $k$ such queries. We establish tight bounds on the performance any algorithm can achieve when given access to $k$ best-action queries for different types of feedback models. In particular, we prove that in the full feedback model, $k$ queries are enough to achieve an optimal regret of $Θ\left(\min\left\{\sqrt T, \frac Tk\right\}\right)$. This finding highlights the significant multiplicative advantage in the regret rate achievable with even a modest (sublinear) number $k \in Ω(\sqrt{T})$ of queries. Additionally, we study the challenging setting in which the only available feedback is obtained during the time steps corresponding to the $k$ best-action queries. There, we provide a tight regret rate of $Θ\left(\min\left\{\frac{T}{\sqrt k},\frac{T^2}{k^2}\right\}\right)$, which improves over the standard $Θ\left(\frac{T}{\sqrt k}\right)$ regret rate for label efficient prediction for $k \in Ω(T^{2/3})$.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes