LGCVSep 26, 2022

Learning to Learn with Generative Models of Neural Network Checkpoints

Berkeley
arXiv:2209.12892v196 citationsh-index: 111
Originality Highly original
AI Analysis

This introduces a novel data-driven approach for neural network optimization, potentially speeding up training processes in supervised and reinforcement learning domains.

The paper tackles the problem of learning to optimize neural networks by training a conditional diffusion transformer on checkpoint data to predict parameter updates that achieve desired metrics, enabling optimization of unseen networks in one update across various architectures and tasks.

We explore a data-driven approach for learning to optimize neural networks. We construct a dataset of neural network checkpoints and train a generative model on the parameters. In particular, our model is a conditional diffusion transformer that, given an initial input parameter vector and a prompted loss, error, or return, predicts the distribution over parameter updates that achieve the desired metric. At test time, it can optimize neural networks with unseen parameters for downstream tasks in just one update. We find that our approach successfully generates parameters for a wide range of loss prompts. Moreover, it can sample multimodal parameter solutions and has favorable scaling properties. We apply our method to different neural network architectures and tasks in supervised and reinforcement learning.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes