Agnostic Sample Compression Schemes for Regression
This addresses a foundational problem in machine learning theory for researchers, providing new compression bounds and negative results that refine prior work and pose open questions generalizing classic conjectures.
The paper tackles the problem of bounded sample compression for agnostic regression with ℓ_p loss, constructing the first positive results including an approximate scheme with size exponential in fat-shattering dimension and efficient exact schemes for ℓ_1 and ℓ_∞ losses, while proving no exact bounded scheme exists for other ℓ_p losses.
We obtain the first positive results for bounded sample compression in the agnostic regression setting with the $\ell_p$ loss, where $p\in [1,\infty]$. We construct a generic approximate sample compression scheme for real-valued function classes exhibiting exponential size in the fat-shattering dimension but independent of the sample size. Notably, for linear regression, an approximate compression of size linear in the dimension is constructed. Moreover, for $\ell_1$ and $\ell_\infty$ losses, we can even exhibit an efficient exact sample compression scheme of size linear in the dimension. We further show that for every other $\ell_p$ loss, $p\in (1,\infty)$, there does not exist an exact agnostic compression scheme of bounded size. This refines and generalizes a negative result of David, Moran, and Yehudayoff for the $\ell_2$ loss. We close by posing general open questions: for agnostic regression with $\ell_1$ loss, does every function class admits an exact compression scheme of size equal to its pseudo-dimension? For the $\ell_2$ loss, does every function class admit an approximate compression scheme of polynomial size in the fat-shattering dimension? These questions generalize Warmuth's classic sample compression conjecture for realizable-case classification.