LG MLAug 12, 2023

Multiclass Learnability Does Not Imply Sample Compression

arXiv:2308.06424v29.86 citationsh-index: 9

Originality Incremental advance

AI Analysis

This is a foundational theoretical result for machine learning researchers, showing a key difference between binary and multiclass learnability, but it is incremental as it builds on prior work on sample compression and DS dimension.

The paper tackles the problem of whether learnable multiclass hypothesis classes admit sample compression schemes of size bounded by a finite function of their DS dimension, analogous to the binary case, and shows that this is not true, disproving the analogous statement for multiclass classes.

A hypothesis class admits a sample compression scheme, if for every sample labeled by a hypothesis from the class, it is possible to retain only a small subsample, using which the labels on the entire sample can be inferred. The size of the compression scheme is an upper bound on the size of the subsample produced. Every learnable binary hypothesis class (which must necessarily have finite VC dimension) admits a sample compression scheme of size only a finite function of its VC dimension, independent of the sample size. For multiclass hypothesis classes, the analog of VC dimension is the DS dimension. We show that the analogous statement pertaining to sample compression is not true for multiclass hypothesis classes: every learnable multiclass hypothesis class, which must necessarily have finite DS dimension, does not admit a sample compression scheme of size only a finite function of its DS dimension.

View on arXiv PDF

Similar