LG NAAug 4, 2025

Neural Networks with Orthogonal Jacobian

Alex Massucco, Davide Murari, Carola-Bibiane Schönlieb

arXiv:2508.02882v12 citationsh-index: 18

Originality Incremental advance

AI Analysis

This work addresses the trainability issue in deep neural networks for machine learning practitioners, offering a novel approach that is incremental over existing methods like orthogonal initialization and residual architectures.

The authors tackled the problem of vanishing or exploding gradients in very deep neural networks by introducing a framework for networks with orthogonal Jacobian matrices, which achieved perfect dynamical isometry and enabled efficient training of deep models without relying on skip connections, with experimental evidence showing competitive performance.

Very deep neural networks achieve state-of-the-art performance by extracting rich, hierarchical features. Yet, training them via backpropagation is often hindered by vanishing or exploding gradients. Existing remedies, such as orthogonal or variance-preserving initialisation and residual architectures, allow for a more stable gradient propagation and the training of deeper models. In this work, we introduce a unified mathematical framework that describes a broad class of nonlinear feedforward and residual networks, whose input-to-output Jacobian matrices are exactly orthogonal almost everywhere. Such a constraint forces the resulting networks to achieve perfect dynamical isometry and train efficiently despite being very deep. Our formulation not only recovers standard architectures as particular cases but also yields new designs that match the trainability of residual networks without relying on conventional skip connections. We provide experimental evidence that perfect Jacobian orthogonality at initialisation is sufficient to stabilise training and achieve competitive performance. We compare this strategy to networks regularised to maintain the Jacobian orthogonality and obtain comparable results. We further extend our analysis to a class of networks well-approximated by those with orthogonal Jacobians and introduce networks with Jacobians representing partial isometries. These generalized models are then showed to maintain the favourable trainability properties.

View on arXiv PDF

Similar