Partial Feedback Online Learning
This work addresses a novel learning protocol for online learning with partial feedback, providing foundational insights into learnability conditions, though it is incremental in extending existing online learning theory.
The paper tackles the problem of partial-feedback online learning, where learners receive only one acceptable label per round, by introducing a collection version space and new dimensions (PFLdim and PMSdim) that tightly characterize minimax regret, resolving an open question and showing linear regret barriers beyond set realizability.
We study a new learning protocol, termed partial-feedback online learning, where each instance admits a set of acceptable labels, but the learner observes only one acceptable label per round. We highlight that, while classical version space is widely used for online learnability, it does not directly extend to this setting. We address this obstacle by introducing a collection version space, which maintains sets of hypotheses rather than individual hypotheses. Using this tool, we obtain a tight characterization of learnability in the set-realizable regime. In particular, we define the Partial-Feedback Littlestone dimension (PFLdim) and the Partial-Feedback Measure Shattering dimension (PMSdim), and show that they tightly characterize the minimax regret for deterministic and randomized learners, respectively. We further identify a nested inclusion condition under which deterministic and randomized learnability coincide, resolving an open question of Raman et al. (2024b). Finally, given a hypothesis space H, we show that beyond set realizability, the minimax regret can be linear even when |H|=2, highlighting a barrier beyond set realizability.