LGAIMLMay 23, 2023

Transferring Learning Trajectories of Neural Networks

arXiv:2305.14122v24 citations
Originality Incremental advance
AI Analysis

This addresses computational inefficiency in model ensemble and fine-tuning for machine learning practitioners, though it is incremental as it builds on existing training methods.

The paper tackles the problem of expensive duplicated training of deep neural networks by transferring a learning trajectory from one initial parameter to another, achieving non-trivial accuracy without direct training and significantly faster training than starting from scratch.

Training deep neural networks (DNNs) is computationally expensive, which is problematic especially when performing duplicated or similar training runs in model ensemble or fine-tuning pre-trained models, for example. Once we have trained one DNN on some dataset, we have its learning trajectory (i.e., a sequence of intermediate parameters during training) which may potentially contain useful information for learning the dataset. However, there has been no attempt to utilize such information of a given learning trajectory for another training. In this paper, we formulate the problem of "transferring" a given learning trajectory from one initial parameter to another one (learning transfer problem) and derive the first algorithm to approximately solve it by matching gradients successively along the trajectory via permutation symmetry. We empirically show that the transferred parameters achieve non-trivial accuracy before any direct training, and can be trained significantly faster than training from scratch.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes