Tight Bounds on $\ell_1$ Approximation and Learning of Self-Bounding Functions
This work provides foundational theoretical insights into the complexity of learning self-bounding functions, which include submodular and XOS functions, with implications for machine learning algorithms.
The paper tackles the problem of approximating and learning self-bounding functions over the Boolean hypercube, achieving nearly tight bounds with degree $ ilde{O}(1/\epsilon)$ and junta-size $2^{ ilde{O}(1/\epsilon)}$, which improves upon previous $\ell_2$ approximation results.
We study the complexity of learning and approximation of self-bounding functions over the uniform distribution on the Boolean hypercube ${0,1}^n$. Informally, a function $f:{0,1}^n \rightarrow \mathbb{R}$ is self-bounding if for every $x \in {0,1}^n$, $f(x)$ upper bounds the sum of all the $n$ marginal decreases in the value of the function at $x$. Self-bounding functions include such well-known classes of functions as submodular and fractionally-subadditive (XOS) functions. They were introduced by Boucheron et al. (2000) in the context of concentration of measure inequalities. Our main result is a nearly tight $\ell_1$-approximation of self-bounding functions by low-degree juntas. Specifically, all self-bounding functions can be $ε$-approximated in $\ell_1$ by a polynomial of degree $\tilde{O}(1/ε)$ over $2^{\tilde{O}(1/ε)}$ variables. We show that both the degree and junta-size are optimal up to logarithmic terms. Previous techniques considered stronger $\ell_2$ approximation and proved nearly tight bounds of $Θ(1/ε^{2})$ on the degree and $2^{Θ(1/ε^2)}$ on the number of variables. Our bounds rely on the analysis of noise stability of self-bounding functions together with a stronger connection between noise stability and $\ell_1$ approximation by low-degree polynomials. This technique can also be used to get tighter bounds on $\ell_1$ approximation by low-degree polynomials and faster learning algorithm for halfspaces. These results lead to improved and in several cases almost tight bounds for PAC and agnostic learning of self-bounding functions relative to the uniform distribution. In particular, assuming hardness of learning juntas, we show that PAC and agnostic learning of self-bounding functions have complexity of $n^{\tildeΘ(1/ε)}$.