LGAIMLSep 26, 2025

Global Convergence in Neural ODEs: Impact of Activation Functions

arXiv:2509.22436v15 citationsh-index: 6ICLR
Originality Incremental advance
AI Analysis

This work addresses convergence issues for researchers and practitioners using Neural ODEs, offering incremental theoretical insights and practical guidelines.

The paper tackles training challenges in Neural ODEs by analyzing how activation functions affect convergence, showing that smoothness and nonlinearity ensure global convergence under gradient descent in overparameterized regimes, with theoretical results validated by numerical experiments.

Neural Ordinary Differential Equations (ODEs) have been successful in various applications due to their continuous nature and parameter-sharing efficiency. However, these unique characteristics also introduce challenges in training, particularly with respect to gradient computation accuracy and convergence analysis. In this paper, we address these challenges by investigating the impact of activation functions. We demonstrate that the properties of activation functions, specifically smoothness and nonlinearity, are critical to the training dynamics. Smooth activation functions guarantee globally unique solutions for both forward and backward ODEs, while sufficient nonlinearity is essential for maintaining the spectral properties of the Neural Tangent Kernel (NTK) during training. Together, these properties enable us to establish the global convergence of Neural ODEs under gradient descent in overparameterized regimes. Our theoretical findings are validated by numerical experiments, which not only support our analysis but also provide practical guidelines for scaling Neural ODEs, potentially leading to faster training and improved performance in real-world applications.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes