LGDSMay 27, 2025

Algorithms and SQ Lower Bounds for Robustly Learning Real-valued Multi-index Models

arXiv:2505.21475v18 citationsh-index: 7
Originality Incremental advance
AI Analysis

This work addresses the complexity of robust learning for high-dimensional structured functions, offering theoretical insights with potential applications in machine learning, though it is incremental in advancing existing bounds.

The paper tackles the problem of learning real-valued Multi-Index Models (MIMs) under Gaussian distribution with adversarial label noise, presenting an algorithm with complexity d^{O(m)}2^{poly(K/ε)} and a nearly matching Statistical Query lower bound of d^{Ω(m)}. It also provides an efficient learner for positive-homogeneous Lipschitz MIMs with complexity poly(d)2^{poly(KL/ε)}, removing exponential dependence on network size for Lipschitz homogeneous ReLU networks.

We study the complexity of learning real-valued Multi-Index Models (MIMs) under the Gaussian distribution. A $K$-MIM is a function $f:\mathbb{R}^d\to \mathbb{R}$ that depends only on the projection of its input onto a $K$-dimensional subspace. We give a general algorithm for PAC learning a broad class of MIMs with respect to the square loss, even in the presence of adversarial label noise. Moreover, we establish a nearly matching Statistical Query (SQ) lower bound, providing evidence that the complexity of our algorithm is qualitatively optimal as a function of the dimension. Specifically, we consider the class of bounded variation MIMs with the property that degree at most $m$ distinguishing moments exist with respect to projections onto any subspace. In the presence of adversarial label noise, the complexity of our learning algorithm is $d^{O(m)}2^{\mathrm{poly}(K/ε)}$. For the realizable and independent noise settings, our algorithm incurs complexity $d^{O(m)}2^{\mathrm{poly}(K)}(1/ε)^{O(K)}$. To complement our upper bound, we show that if for some subspace degree-$m$ distinguishing moments do not exist, then any SQ learner for the corresponding class of MIMs requires complexity $d^{Ω(m)}$. As an application, we give the first efficient learner for the class of positive-homogeneous $L$-Lipschitz $K$-MIMs. The resulting algorithm has complexity $\mathrm{poly}(d) 2^{\mathrm{poly}(KL/ε)}$. This gives a new PAC learning algorithm for Lipschitz homogeneous ReLU networks with complexity independent of the network size, removing the exponential dependence incurred in prior work.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes