LGJun 16, 2025

The Butterfly Effect: Neural Network Training Trajectories Are Highly Sensitive to Initial Conditions

Devin Kwok, Gül Sena Altıntaş, Colin Raffel, David Rolnick

arXiv:2506.13234v113.04 citationsh-index: 17ICML

Originality Incremental advance

AI Analysis

This work addresses the problem of training stability for machine learning practitioners, offering insights into fine-tuning and model merging, though it is incremental in nature.

The study investigated the sensitivity of neural network training trajectories to initial conditions, finding that small perturbations during the early chaotic phase cause significant divergence in parameters and learned functions, with effects diminishing over time.

Neural network training is inherently sensitive to initialization and the randomness induced by stochastic gradient descent. However, it is unclear to what extent such effects lead to meaningfully different networks, either in terms of the models' weights or the underlying functions that were learned. In this work, we show that during the initial "chaotic" phase of training, even extremely small perturbations reliably causes otherwise identical training trajectories to diverge-an effect that diminishes rapidly over training time. We quantify this divergence through (i) $L^2$ distance between parameters, (ii) the loss barrier when interpolating between networks, (iii) $L^2$ and barrier between parameters after permutation alignment, and (iv) representational similarity between intermediate activations; revealing how perturbations across different hyperparameter or fine-tuning settings drive training trajectories toward distinct loss minima. Our findings provide insights into neural network training stability, with practical implications for fine-tuning, model merging, and diversity of model ensembles.

View on arXiv PDF

Similar