MLLGOCFeb 23

Path-conditioned training: a principled way to rescale ReLU neural networks

arXiv:2602.19799v11 citationsh-index: 12
Originality Incremental advance
AI Analysis

This addresses training efficiency for neural network practitioners, though it appears incremental as it builds on existing path-lifting frameworks.

The paper tackles the problem of leveraging rescaling symmetries in ReLU neural networks to improve training dynamics, introducing a path-conditioned training method that aligns a kernel in the path-lifting space with a reference, which numerical experiments show can speed up training.

Despite recent algorithmic advances, we still lack principled ways to leverage the well-documented rescaling symmetries in ReLU neural network parameters. While two properly rescaled weights implement the same function, the training dynamics can be dramatically different. To offer a fresh perspective on exploiting this phenomenon, we build on the recent path-lifting framework, which provides a compact factorization of ReLU networks. We introduce a geometrically motivated criterion to rescale neural network parameters which minimization leads to a conditioning strategy that aligns a kernel in the path-lifting space with a chosen reference. We derive an efficient algorithm to perform this alignment. In the context of random network initialization, we analyze how the architecture and the initialization scale jointly impact the output of the proposed method. Numerical experiments illustrate its potential to speed up training.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes