Gradient flow in parameter space is equivalent to linear interpolation in output space
This foundational result clarifies optimization dynamics in deep learning, potentially impacting all ML/AI practitioners by linking parameter and output spaces.
The paper proves that gradient flow in parameter space can be transformed into linear interpolation in output space for L² loss under full-rank Jacobian conditions, achieving a global minimum, and provides an explicit formula for cross-entropy loss under similar assumptions.
We prove that the standard gradient flow in parameter space that underlies many training algorithms in deep learning can be continuously deformed into an adapted gradient flow which yields (constrained) Euclidean gradient flow in output space. Moreover, for the $L^{2}$ loss, if the Jacobian of the outputs with respect to the parameters is full rank (for fixed training data), then the time variable can be reparametrized so that the resulting flow is simply linear interpolation, and a global minimum can be achieved. For the cross-entropy loss, under the same rank condition and assuming the labels have positive components, we derive an explicit formula for the unique global minimum.