Benjamin Gess

LG
h-index7
9papers
169citations
Novelty53%
AI Score43

9 Papers

PRNov 7, 2017
Strong convergence rates for explicit space-time discrete numerical approximations of stochastic Allen-Cahn equations

Sebastian Becker, Benjamin Gess, Arnulf Jentzen et al.

The scientific literature contains a number of numerical approximation results for stochastic partial differential equations (SPDEs) with superlinearly growing nonlinearities but, to the best of our knowledge, none of them prove strong or weak convergence rates for full-discrete numerical approximations of space-time white noise driven SPDEs with superlinearly growing nonlinearities. In particular, in the scientific literature there exists neither a result which proves strong convergence rates nor a result which proves weak convergence rates for full-discrete numerical approximations of stochastic Allen-Cahn equations. In this article we bridge this gap and establish strong convergence rates for full-discrete numerical approximations of space-time white noise driven SPDEs with superlinearly growing nonlinearities such as stochastic Allen-Cahn equations. Moreover, we also establish lower bounds for strong temporal and spatial approximation errors which demonstrate that our strong convergence rates are essentially sharp and can, in general, not be improved.

NADec 18, 2015
Semi-discretization for stochastic scalar conservation laws with multiple rough fluxes

Benjamin Gess, Benoît Perthame, Panagiotis E. Souganidis

We develop a semi-discretization approximation for scalar conservation laws with multiple rough time dependence in inhomogeneous fluxes. The method is based on Brenier's transport-collapse algorithm and uses characteristics defined in the setting of rough paths. We prove strong $L^1$-convergence for inhomogeneous fluxes and provide a rate of convergence for homogeneous one's. The approximation scheme as well as the proofs are based on the recently developed theory of pathwise entropy solutions and uses the kinetic formulation which allows to define globally the (rough) characteristics.

OCFeb 7, 2023
Exponential convergence rates for momentum stochastic gradient descent in the overparametrized setting

Benjamin Gess, Sebastian Kassing

We prove explicit bounds on the exponential rate of convergence for the momentum stochastic gradient descent scheme (MSGD) for arbitrary, fixed hyperparameters (learning rate, friction parameter) and its continuous-in-time counterpart in the context of non-convex optimization. In the small step-size regime and in the case of flat minima or large noise intensities, these bounds prove faster convergence of MSGD compared to plain stochastic gradient descent (SGD). The results are shown for objective functions satisfying a local Polyak-Lojasiewicz inequality and under assumptions on the variance of MSGD that are satisfied in overparametrized settings. Moreover, we analyze the optimal choice of the friction parameter and show that the MSGD process almost surely converges to a local minimum.

PRJul 12, 2022
Conservative SPDEs as fluctuating mean field limits of stochastic gradient descent

Benjamin Gess, Rishabh S. Gvalani, Vitalii Konarovskyi

The convergence of stochastic interacting particle systems in the mean-field limit to solutions of conservative stochastic partial differential equations is established, with optimal rate of convergence. As a second main result, a quantitative central limit theorem for such SPDEs is derived, again, with optimal rate of convergence. The results apply, in particular, to the convergence in the mean-field scaling of stochastic gradient descent dynamics in overparametrized, shallow neural networks to solutions of SPDEs. It is shown that the inclusion of fluctuations in the limiting SPDE improves the rate of convergence, and retains information about the fluctuations of stochastic gradient descent in the continuum limit.

PRFeb 14, 2023
Stochastic Modified Flows, Mean-Field Limits and Dynamics of Stochastic Gradient Descent

Benjamin Gess, Sebastian Kassing, Vitalii Konarovskyi

We propose new limiting dynamics for stochastic gradient descent in the small learning rate regime called stochastic modified flows. These SDEs are driven by a cylindrical Brownian motion and improve the so-called stochastic modified equations by having regular diffusion coefficients and by matching the multi-point statistics. As a second contribution, we introduce distribution dependent stochastic modified flows which we prove to describe the fluctuating limiting dynamics of stochastic gradient descent in the small learning rate - infinite width scaling regime.

LGMar 10
Large Spikes in Stochastic Gradient Descent: A Large-Deviations View

Benjamin Gess, Daniel Heydecker

We analyse SGD training of a shallow, fully connected network in the NTK scaling and provide a quantitative theory of the catapult phase. We identify an explicit criterion separating two behaviours: When an explicit function $G$, depending only on the kernel, learning rate $η$ and data, is positive, SGD produces large NTK-flattening spikes with high probability; when $G<0$, their probability decays like $(n/η)^{-\vartheta/2}$, for an explicitly characterised $\vartheta\in (0,\infty)$. This yields a concrete parameter-dependent explanation for why such spikes may still be observed at practical widths.

LGFeb 2, 2024
Stochastic Modified Flows for Riemannian Stochastic Gradient Descent

Benjamin Gess, Sebastian Kassing, Nimit Rana

We give quantitative estimates for the rate of convergence of Riemannian stochastic gradient descent (RSGD) to Riemannian gradient flow and to a diffusion process, the so-called Riemannian stochastic modified flow (RSMF). Using tools from stochastic differential geometry we show that, in the small learning rate regime, RSGD can be approximated by the solution to the RSMF driven by an infinite-dimensional Wiener process. The RSMF accounts for the random fluctuations of RSGD and, thereby, increases the order of approximation compared to the deterministic Riemannian gradient flow. The RSGD is build using the concept of a retraction map, that is, a cost efficient approximation of the exponential map, and we prove quantitative bounds for the weak error of the diffusion approximation under assumptions on the retraction map, the geometry of the manifold, and the random estimators of the gradient.

LGSep 23, 2025
THINNs: Thermodynamically Informed Neural Networks

Javier Castro, Benjamin Gess

Physics-Informed Neural Networks (PINNs) are a class of deep learning models aiming to approximate solutions of PDEs by training neural networks to minimize the residual of the equation. Focusing on non-equilibrium fluctuating systems, we propose a physically informed choice of penalization that is consistent with the underlying fluctuation structure, as characterized by a large deviations principle. This approach yields a novel formulation of PINNs in which the penalty term is chosen to penalize improbable deviations, rather than being selected heuristically. The resulting thermodynamically consistent extension of PINNs, termed THINNs, is subsequently analyzed by establishing analytical a posteriori estimates, and providing empirical comparisons to established penalization strategies.