LG AIFeb 9

Regime Change Hypothesis: Foundations for Decoupled Dynamics in Neural Network Training

Cristian Pérez-Corral, Alberto Fernández-Hernández, Jose I. Mestre, Manuel F. Dolz, Jose Duato, Enrique S. Quintana-Ortí

arXiv:2602.08333v15.83 citationsh-index: 16

Originality Incremental advance

AI Analysis

This provides a concrete, architecture-agnostic tool for monitoring training dynamics, which could aid in developing decoupled optimization strategies for piecewise-linear networks, though it is incremental in nature.

The paper investigates whether neural network training exhibits a two-timescale behavior, with early changes in activation patterns and later refinement within stable regimes, and finds that activation-pattern changes decay three times earlier than weight-update magnitudes across various architectures.

Despite the empirical success of DNN, their internal training dynamics remain difficult to characterize. In ReLU-based models, the activation pattern induced by a given input determines the piecewise-linear region in which the network behaves affinely. Motivated by this geometry, we investigate whether training exhibits a two-timescale behavior: an early stage with substantial changes in activation patterns and a later stage where weight updates predominantly refine the model within largely stable activation regimes. We first prove a local stability property: outside measure-zero sets of parameters and inputs, sufficiently small parameter perturbations preserve the activation pattern of a fixed input, implying locally affine behavior within activation regions. We then empirically track per-iteration changes in weights and activation patterns across fully-connected and convolutional architectures, as well as Transformer-based models, where activation patterns are recorded in the ReLU feed-forward (MLP/FFN) submodules, using fixed validation subsets. Across the evaluated settings, activation-pattern changes decay 3 times earlier than weight-update magnitudes, showing that late-stage training often proceeds within relatively stable activation regimes. These findings provide a concrete, architecture-agnostic instrument for monitoring training dynamics and motivate further study of decoupled optimization strategies for piecewise-linear networks. For reproducibility, code and experiment configurations will be released upon acceptance.

View on arXiv PDF

Similar