Learning with Multiple Correct Answers -- A Trichotomy of Regret Bounds under Different Feedback Models
This addresses the challenge of learning when multiple answers are acceptable, particularly for language generation, but is incremental as it builds on existing online learning frameworks with new combinatorial dimensions.
The paper tackles the problem of online learning with multiple correct answers, where each instance has a set of valid labels, motivated by language generation tasks. It establishes optimal mistake bounds in the realizable setting and a trichotomy of regret bounds across three feedback models in the agnostic setting, with results implying sample complexity bounds for batch setups.
We study an online learning problem with multiple correct answers, where each instance admits a set of valid labels, and in each round the learner must output a valid label for the queried example. This setting is motivated by language generation tasks, in which a prompt may admit many acceptable completions, but not every completion is acceptable. We study this problem under three feedback models. For each model, we characterize the optimal mistake bound in the realizable setting using an appropriate combinatorial dimension. We then establish a trichotomy of regret bounds across the three models in the agnostic setting. Our results also imply sample complexity bounds for the batch setup that depend on the respective combinatorial dimensions.