Maximilian Engel

LG
h-index3
4papers
11citations
Novelty43%
AI Score43

4 Papers

PRMar 6
Random Quadratic Form on a Sphere: Synchronization by Common Noise

Maximilian Engel, Anna Shalova

We introduce the Random Quadratic Form (RQF): a stochastic differential equation which formally corresponds to the gradient flow of a random quadratic functional on a sphere. While the one-point dynamics of the system is a Brownian motion and thus has no preferred direction, the two-point motion exhibits nontrivial synchronizing behaviour. In this work we study synchronization of the RQF, namely we give both distributional and path-wise characterizations of the solutions by studying invariant measures and random attractors of the system. The RQF model is motivated by the study of the role of linear layers in transformers and illustrates the synchronization by common noise phenomena arising in the simplified models of transformers. In particular, we provide an alternative (independent of self-attention) explanation of the clustering behaviour in deep transformers and show that tokens cluster even in the absence of the self-attention mechanism.

LGJul 29, 2024
Characterizing Dynamical Stability of Stochastic Gradient Descent in Overparameterized Learning

Dennis Chemnitz, Maximilian Engel

For overparameterized optimization tasks, such as those found in modern machine learning, global minima are generally not unique. In order to understand generalization in these settings, it is vital to study to which minimum an optimization algorithm converges. The possibility of having minima that are unstable under the dynamics imposed by the optimization algorithm limits the potential minima that the algorithm can find. In this paper, we characterize the global minima that are dynamically stable/unstable for both deterministic and stochastic gradient descent (SGD). In particular, we introduce a characteristic Lyapunov exponent that depends on the local dynamics around a global minimum and rigorously prove that the sign of this Lyapunov exponent determines whether SGD can accumulate at the respective global minimum.

NAApr 30
Noise-induced enhancement of regime lifetimes -- A data-driven approach using deterministic trajectories

Henry Schoeller, Robin Chemnitz, Péter Koltai et al.

We investigate the lifetime of dynamical regimes under the impact of noise motivated by low-dimensional models of the atmosphere. One may expect that the inclusion of noise tends to make the system leave prescribed regions of the state space faster. However, for relevant systems with complexities ranging from phenomenological toy models to reduced models of atmospheric dynamics, this intuition has proven misleading. As long as the noise is sufficiently small, the noisy system stays in regimes of interest on average longer than its deterministic counterpart, an effect we call ``stochastic inertia''. This phenomenon has been observed through extensive numerical simulations for different noise levels. We propose a numerical technique for testing the occurrence of stochastic inertia, constructing, for any fixed noise level, a Markov chain on the set of points given by a sufficiently long trajectory of the system without noise. The method is shown to correctly predict the presence of stochastic inertia in simple systems, and its utility is demonstrated on a paradigm model of atmospheric dynamics.

DSJul 7, 2025
A Dynamical Systems Perspective on the Analysis of Neural Networks

Dennis Chemnitz, Maximilian Engel, Christian Kuehn et al.

In this chapter, we utilize dynamical systems to analyze several aspects of machine learning algorithms. As an expository contribution we demonstrate how to re-formulate a wide variety of challenges from deep neural networks, (stochastic) gradient descent, and related topics into dynamical statements. We also tackle three concrete challenges. First, we consider the process of information propagation through a neural network, i.e., we study the input-output map for different architectures. We explain the universal embedding property for augmented neural ODEs representing arbitrary functions of given regularity, the classification of multilayer perceptrons and neural ODEs in terms of suitable function classes, and the memory-dependence in neural delay equations. Second, we consider the training aspect of neural networks dynamically. We describe a dynamical systems perspective on gradient descent and study stability for overdetermined problems. We then extend this analysis to the overparameterized setting and describe the edge of stability phenomenon, also in the context of possible explanations for implicit bias. For stochastic gradient descent, we present stability results for the overparameterized setting via Lyapunov exponents of interpolation solutions. Third, we explain several results regarding mean-field limits of neural networks. We describe a result that extends existing techniques to heterogeneous neural networks involving graph limits via digraph measures. This shows how large classes of neural networks naturally fall within the framework of Kuramoto-type models on graphs and their large-graph limits. Finally, we point out that similar strategies to use dynamics to study explainable and reliable AI can also be applied to settings such as generative models or fundamental issues in gradient training methods, such as backpropagation or vanishing/exploding gradients.