ML AI LGMar 17, 2025

Optimizing ML Training with Metagradient Descent

Logan Engstrom, Andrew Ilyas, Benjamin Chen, Axel Feldmann, William Moses, Aleksander Madry

arXiv:2503.13751v122 citationsh-index: 31

Originality Highly original

AI Analysis

This work addresses the problem of optimizing training setups for machine learning practitioners, offering a novel method that is incremental in its application to specific bottlenecks like dataset selection and learning rate scheduling.

The paper tackled the challenge of configuring training processes for large-scale machine learning models by developing a gradient-based approach using metagradients, resulting in improvements such as outperforming data poisoning attacks by an order of magnitude and finding competitive learning rate schedules.

A major challenge in training large-scale machine learning models is configuring the training process to maximize model performance, i.e., finding the best training setup from a vast design space. In this work, we unlock a gradient-based approach to this problem. We first introduce an algorithm for efficiently calculating metagradients -- gradients through model training -- at scale. We then introduce a "smooth model training" framework that enables effective optimization using metagradients. With metagradient descent (MGD), we greatly improve on existing dataset selection methods, outperform accuracy-degrading data poisoning attacks by an order of magnitude, and automatically find competitive learning rate schedules.

View on arXiv PDF

Similar