Multiclass learnability and the ERM principle
This work addresses theoretical limitations in multiclass learning for machine learning researchers, showing it is not a straightforward extension of binary classification.
The paper investigates multiclass prediction sample complexity, revealing that unlike binary classification, different ERM learners can have varying success rates and sample efficiencies for the same hypothesis class. It introduces a design principle for effective ERM learners and provides tight sample complexity bounds for symmetric multiclass classes, along with characterizations for online and bandit settings using generalized Littlestone's dimension.
We study the sample complexity of multiclass prediction in several learning settings. For the PAC setting our analysis reveals a surprising phenomenon: In sharp contrast to binary classification, we show that there exist multiclass hypothesis classes for which some Empirical Risk Minimizers (ERM learners) have lower sample complexity than others. Furthermore, there are classes that are learnable by some ERM learners, while other ERM learners will fail to learn them. We propose a principle for designing good ERM learners, and use this principle to prove tight bounds on the sample complexity of learning {\em symmetric} multiclass hypothesis classes---classes that are invariant under permutations of label names. We further provide a characterization of mistake and regret bounds for multiclass learning in the online setting and the bandit setting, using new generalizations of Littlestone's dimension.