Algorithms and SQ Lower Bounds for Robustly Learning Real-valued Multi-index Models
This work addresses the complexity of robust learning for high-dimensional structured functions, offering theoretical insights with potential applications in machine learning, though it is incremental in advancing existing bounds.
The paper tackles the problem of learning real-valued Multi-Index Models (MIMs) under Gaussian distribution with adversarial label noise, presenting an algorithm with complexity d^{O(m)}2^{poly(K/ε)} and a nearly matching Statistical Query lower bound of d^{Ω(m)}. It also provides an efficient learner for positive-homogeneous Lipschitz MIMs with complexity poly(d)2^{poly(KL/ε)}, removing exponential dependence on network size for Lipschitz homogeneous ReLU networks.
We study the complexity of learning real-valued Multi-Index Models (MIMs) under the Gaussian distribution. A $K$-MIM is a function $f:\mathbb{R}^d\to \mathbb{R}$ that depends only on the projection of its input onto a $K$-dimensional subspace. We give a general algorithm for PAC learning a broad class of MIMs with respect to the square loss, even in the presence of adversarial label noise. Moreover, we establish a nearly matching Statistical Query (SQ) lower bound, providing evidence that the complexity of our algorithm is qualitatively optimal as a function of the dimension. Specifically, we consider the class of bounded variation MIMs with the property that degree at most $m$ distinguishing moments exist with respect to projections onto any subspace. In the presence of adversarial label noise, the complexity of our learning algorithm is $d^{O(m)}2^{\mathrm{poly}(K/ε)}$. For the realizable and independent noise settings, our algorithm incurs complexity $d^{O(m)}2^{\mathrm{poly}(K)}(1/ε)^{O(K)}$. To complement our upper bound, we show that if for some subspace degree-$m$ distinguishing moments do not exist, then any SQ learner for the corresponding class of MIMs requires complexity $d^{Ω(m)}$. As an application, we give the first efficient learner for the class of positive-homogeneous $L$-Lipschitz $K$-MIMs. The resulting algorithm has complexity $\mathrm{poly}(d) 2^{\mathrm{poly}(KL/ε)}$. This gives a new PAC learning algorithm for Lipschitz homogeneous ReLU networks with complexity independent of the network size, removing the exponential dependence incurred in prior work.