Why Smooth Stability Assumptions Fail for ReLU Learning
This work addresses a foundational issue in stability analysis for deep learning, highlighting the limitations of smooth approximations for ReLU networks and motivating nonsmooth-aware frameworks.
The paper demonstrates that smoothness assumptions, such as gradient Lipschitzness, fail globally for ReLU networks, even in empirically stable settings, by providing a concrete counterexample and identifying a minimal condition to restore stability.
Stability analyses of modern learning systems are frequently derived under smoothness assumptions that are violated by ReLU-type nonlinearities. In this note, we isolate a minimal obstruction by showing that no uniform smoothness-based stability proxy such as gradient Lipschitzness or Hessian control can hold globally for ReLU networks, even in simple settings where training trajectories appear empirically stable. We give a concrete counterexample demonstrating the failure of classical stability bounds and identify a minimal generalized derivative condition under which stability statements can be meaningfully restored. The result clarifies why smooth approximations of ReLU can be misleading and motivates nonsmooth-aware stability frameworks.