LGFeb 13, 2025

When Do Neural Networks Learn World Models?

arXiv:2502.09297v55 citationsh-index: 8ICML
Originality Highly original
AI Analysis

This addresses a foundational open problem in AI about neural network capabilities, with implications for self-supervised learning and generalization, though it is incremental as it builds on existing theoretical frameworks.

The paper tackles the problem of whether neural networks can learn world models that capture latent data-generating variables, showing theoretically that models with a low-degree bias can provably recover these variables in a multi-task setting under mild assumptions, even with complex proxy tasks.

Humans develop world models that capture the underlying generation process of data. Whether neural networks can learn similar world models remains an open problem. In this work, we present the first theoretical results for this problem, showing that in a multi-task setting, models with a low-degree bias provably recover latent data-generating variables under mild assumptions--even if proxy tasks involve complex, non-linear functions of the latents. However, such recovery is sensitive to model architecture. Our analysis leverages Boolean models of task solutions via the Fourier-Walsh transform and introduces new techniques for analyzing invertible Boolean transforms, which may be of independent interest. We illustrate the algorithmic implications of our results and connect them to related research areas, including self-supervised learning, out-of-distribution generalization, and the linear representation hypothesis in large language models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes