Decoupled Descent: Exact Test Error Tracking Via Approximate Message Passing

arXiv:2604.278839.1

Predicted impact top 41% in ST · last 90 daysOriginality Incremental advance

AI Analysis

For practitioners using parametric models, DD provides a principled way to monitor test error without a validation set, though its theoretical guarantees are limited to stylized Gaussian mixture models.

Decoupled Descent (DD) is a training algorithm that enforces train error to asymptotically track test error for Gaussian mixture models, achieving zero-cost validation and 100% data utilization. In experiments, DD outperforms gradient descent on XOR classification and narrows the generalization gap on noisy MNIST and CIFAR-10.

In modern parametric model training, full-batch gradient descent (and its variants) suffers due to progressively stronger biasing towards the exact realization of training data; this drives the systematic ``generalization gap'', where the train error becomes an unreliable proxy for test error. Existing approaches either argue this gap is benign through complex analysis or sacrifice data to a validation set. In contrast, we introduce decoupled descent (DD), a novel theory-based training algorithm that satisfies a train-test identity -- enforcing the train error to asymptotically track the test error for stylized Gaussian mixture models. Within this specific regime, leveraging approximate message passing theory, DD iteratively cancels the biases due to data reuse, rigorously demonstrating the feasibility of zero-cost validation and $100\%$ data utilization. Moreover, DD is governed by a low-dimensional state evolution recursion, rendering the dynamics of the algorithm transparent and tractable. We validate DD on XOR classification, yielding superior performance compared to GD; additionally, we implement noisy MNIST and non-linear probing of CIFAR-10, demonstrating that even when our stylized assumptions are relaxed, DD narrows the generalization gap compared to GD.

View on arXiv PDF

Similar