Unit-Consistent (UC) Adjoint for GSD and Backprop in Deep Learning Applications
This addresses optimization inefficiencies in deep learning for researchers and practitioners, but it is incremental as it builds on prior rescaling-invariant schemes.
The paper tackles the problem of gradient descent not being equivariant to gauge symmetries in deep neural networks, which causes optimization to depend on arbitrary parameterizations, and introduces a Unit-Consistent adjoint to derive gauge-consistent steepest descent and backpropagation.
Deep neural networks constructed from linear maps and positively homogeneous nonlinearities (e.g., ReLU) possess a fundamental gauge symmetry: the network function is invariant to node-wise diagonal rescalings. However, standard gradient descent is not equivariant to this symmetry, causing optimization trajectories to depend heavily on arbitrary parameterizations. Prior work has proposed rescaling-invariant optimization schemes for positively homogeneous networks (e.g., path-based or path-space updates). Our contribution is complementary: we formulate the invariance requirement at the level of the backward adjoint/optimization geometry, which provides a simple, operator-level recipe that can be applied uniformly across network components and optimizer state. By replacing the Euclidean transpose with a Unit-Consistent (UC) adjoint, we derive UC gauge-consistent steepest descent and backprogation.