LG AISep 13, 2025

Decoupling Search and Learning in Neural Net Training

arXiv:2509.10973v1h-index: 1

Originality Incremental advance

AI Analysis

This work addresses the exploratory limitations of gradient descent in neural network training, offering a potential direction for future training algorithms, though it is incremental as it builds on existing evolutionary and gradient-based methods.

The paper tackles the problem of gradient descent's limited exploration of alternative minima that may generalize better by proposing a two-phase framework: evolutionary search in representation space to find diverse solutions, followed by gradient-based learning to regress to those representations. The approach achieves performance approaching SGD on MNIST, CIFAR-10, and CIFAR-100, with improvements scaling with search compute up to saturation.

Gradient descent typically converges to a single minimum of the training loss without mechanisms to explore alternative minima that may generalize better. Searching for diverse minima directly in high-dimensional parameter space is generally intractable. To address this, we propose a framework that performs training in two distinct phases: search in a tractable representation space (the space of intermediate activations) to find diverse representational solutions, and gradient-based learning in parameter space by regressing to those searched representations. Through evolutionary search, we discover representational solutions whose fitness and diversity scale with compute--larger populations and more generations produce better and more varied solutions. These representations prove to be learnable: networks trained by regressing to searched representations approach SGD's performance on MNIST, CIFAR-10, and CIFAR-100. Performance improves with search compute up to saturation. The resulting models differ qualitatively from networks trained with gradient descent, following different representational trajectories during training. This work demonstrates how future training algorithms could overcome gradient descent's exploratory limitations by decoupling search in representation space from efficient gradient-based learning in parameter space.

View on arXiv PDF

Similar