Deterministic Apple Tasting
This resolves a conjecture about deterministic learnability in apple tasting, providing foundational insights for online learning with partial feedback.
The paper tackles the problem of deterministic online classification with apple tasting feedback, where feedback is only received when predicting 1, and shows that deterministic learning is feasible in the realizable case with a mistake bound of O(√(L(H) T log T)), which is tight for some classes, and characterizes learnability in the agnostic case with a trichotomy of easy, hard, or unlearnable classes.
In binary ($0/1$) online classification with apple tasting feedback, the learner receives feedback only when predicting $1$. Besides some degenerate learning tasks, all previously known learning algorithms for this model are randomized. Consequently, prior to this work it was unknown whether deterministic apple tasting is generally feasible. In this work, we provide the first widely-applicable deterministic apple tasting learner, and show that in the realizable case, a hypothesis class is learnable if and only if it is deterministically learnable, confirming a conjecture of [Raman, Subedi, Raman, Tewari-24]. Quantitatively, we show that every class $\mathcal{H}$ is learnable with mistake bound $O \left(\sqrt{\mathtt{L}(\mathcal{H}) T \log T} \right)$ (where $\mathtt{L}(\mathcal{H})$ is the Littlestone dimension of $\mathcal{H}$), and that this is tight for some classes. We further study the agnostic case, in which the best hypothesis makes at most $k$ many mistakes, and prove a trichotomy stating that every class $\mathcal{H}$ must be either easy, hard, or unlearnable. Easy classes have (both randomized and deterministic) mistake bound $Θ_{\mathcal{H}}(k)$. Hard classes have randomized mistake bound $\tildeΘ_{\mathcal{H}} \left(k + \sqrt{T} \right)$, and deterministic mistake bound $\tildeΘ_{\mathcal{H}} \left(\sqrt{k \cdot T} \right)$, where $T$ is the time horizon. Unlearnable classes have (both randomized and deterministic) mistake bound $Θ(T)$. Our upper bound is based on a deterministic algorithm for learning from expert advice with apple tasting feedback, a problem interesting in its own right. For this problem, we show that the optimal deterministic mistake bound is $Θ\left(\sqrt{T (k + \log n)} \right)$ for all $k$ and $T \leq n \leq 2^T$, where $n$ is the number of experts.