LG DSJul 8, 2024

On the Complexity of Learning Sparse Functions with Statistical and Gradient Queries

Nirmit Joshi, Theodor Misiakiewicz, Nathan Srebro

arXiv:2407.05622v118.215 citationsh-index: 8

Originality Incremental advance

AI Analysis

This work addresses the theoretical understanding of gradient-based learning efficiency for sparse functions, which is incremental in refining query complexity models.

The paper investigates the complexity of gradient algorithms for learning sparse functions (juntas) by introducing Differentiable Learning Queries (DLQ) to model gradient queries, finding that query complexity depends on the loss function: for squared loss, DLQ matches Correlation Statistical Queries (CSQ) complexity, while for other losses like ℓ1 loss, it matches Statistical Queries (SQ) complexity.

The goal of this paper is to investigate the complexity of gradient algorithms when learning sparse functions (juntas). We introduce a type of Statistical Queries ($\mathsf{SQ}$), which we call Differentiable Learning Queries ($\mathsf{DLQ}$), to model gradient queries on a specified loss with respect to an arbitrary model. We provide a tight characterization of the query complexity of $\mathsf{DLQ}$ for learning the support of a sparse function over generic product distributions. This complexity crucially depends on the loss function. For the squared loss, $\mathsf{DLQ}$ matches the complexity of Correlation Statistical Queries $(\mathsf{CSQ})$--potentially much worse than $\mathsf{SQ}$. But for other simple loss functions, including the $\ell_1$ loss, $\mathsf{DLQ}$ always achieves the same complexity as $\mathsf{SQ}$. We also provide evidence that $\mathsf{DLQ}$ can indeed capture learning with (stochastic) gradient descent by showing it correctly describes the complexity of learning with a two-layer neural network in the mean field regime and linear scaling.

View on arXiv PDF

Similar