LGMLSep 9, 2024

Breaking Neural Network Scaling Laws with Modularity

arXiv:2409.05780v27 citationsh-index: 35
Originality Highly original
AI Analysis

This addresses the challenge of scaling neural networks for complex, modular tasks, offering a theoretical foundation and practical method to enhance generalization, though it is incremental in building on existing generalization theory.

The paper tackles the problem of understanding how modular neural networks improve generalization on compositional tasks, showing theoretically that modular networks have sample complexity independent of task dimensionality, unlike nonmodular networks which require exponential samples, and develops a learning rule that empirically improves generalization on high-dimensional tasks.

Modular neural networks outperform nonmodular neural networks on tasks ranging from visual question answering to robotics. These performance improvements are thought to be due to modular networks' superior ability to model the compositional and combinatorial structure of real-world problems. However, a theoretical explanation of how modularity improves generalizability, and how to leverage task modularity while training networks remains elusive. Using recent theoretical progress in explaining neural network generalization, we investigate how the amount of training data required to generalize on a task varies with the intrinsic dimensionality of a task's input. We show theoretically that when applied to modularly structured tasks, while nonmodular networks require an exponential number of samples with task dimensionality, modular networks' sample complexity is independent of task dimensionality: modular networks can generalize in high dimensions. We then develop a novel learning rule for modular networks to exploit this advantage and empirically show the improved generalization of the rule, both in- and out-of-distribution, on high-dimensional, modular tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes