LG DS ST MLMay 21, 2024

Online Learning of Halfspaces with Massart Noise

Ilias Diakonikolas, Vasilis Kontonis, Christos Tzamos, Nikos Zarifis

arXiv:2405.12958v14.61 citationsh-index: 33

Originality Incremental advance

AI Analysis

This addresses the problem of robust online learning under label noise for machine learning practitioners, offering efficient algorithms with provable guarantees, though it builds incrementally on existing noise models and methods.

The paper tackles online learning of halfspaces with Massart noise, presenting an efficient algorithm that achieves a mistake bound of ηT + o(T), which is qualitatively tight. It also extends this to a contextual bandit setting, achieving expected reward at least (1-1/k)ΔT - o(T) above random action selection.

We study the task of online learning in the presence of Massart noise. Instead of assuming that the online adversary chooses an arbitrary sequence of labels, we assume that the context $\mathbf{x}$ is selected adversarially but the label $y$ presented to the learner disagrees with the ground-truth label of $\mathbf{x}$ with unknown probability at most $η$. We study the fundamental class of $γ$-margin linear classifiers and present a computationally efficient algorithm that achieves mistake bound $ηT + o(T)$. Our mistake bound is qualitatively tight for efficient algorithms: it is known that even in the offline setting achieving classification error better than $η$ requires super-polynomial time in the SQ model. We extend our online learning model to a $k$-arm contextual bandit setting where the rewards -- instead of satisfying commonly used realizability assumptions -- are consistent (in expectation) with some linear ranking function with weight vector $\mathbf{w}^\ast$. Given a list of contexts $\mathbf{x}_1,\ldots \mathbf{x}_k$, if $\mathbf{w}^*\cdot \mathbf{x}_i > \mathbf{w}^* \cdot \mathbf{x}_j$, the expected reward of action $i$ must be larger than that of $j$ by at least $Δ$. We use our Massart online learner to design an efficient bandit algorithm that obtains expected reward at least $(1-1/k)~ ΔT - o(T)$ bigger than choosing a random action at every round.

View on arXiv PDF

Similar