LGFeb 10Code
Physics-informed diffusion models in spectral spaceDavide Gallon, Philippe von Wurstemberger, Patrick Cheridito et al.
We propose a methodology that combines generative latent diffusion models with physics-informed machine learning to generate solutions of parametric partial differential equations (PDEs) conditioned on partial observations, which includes, in particular, forward and inverse PDE problems. We learn the joint distribution of PDE parameters and solutions via a diffusion process in a latent space of scaled spectral representations, where Gaussian noise corresponds to functions with controlled regularity. This spectral formulation enables significant dimensionality reduction compared to grid-based diffusion models and ensures that the induced process in function space remains within a class of functions for which the PDE operators are well defined. Building on diffusion posterior sampling, we enforce physics-informed constraints and measurement conditions during inference, applying Adam-based updates at each diffusion step. We evaluate the proposed approach on Poisson, Helmholtz, and incompressible Navier--Stokes equations, demonstrating improved accuracy and computational efficiency compared with existing diffusion-based PDE solvers, which are state of the art for sparse observations. Code is available at https://github.com/deeplearningmethods/PISD.
NAMar 14, 2019
Overcoming the curse of dimensionality in the approximative pricing of financial derivatives with default risksMartin Hutzenthaler, Arnulf Jentzen, Philippe von Wurstemberger
Parabolic partial differential equations (PDEs) are widely used in the mathematical modeling of natural phenomena and man made complex systems. In particular, parabolic PDEs are a fundamental tool to determine fair prices of financial derivatives in the financial industry. The PDEs appearing in financial engineering applications are often nonlinear and high dimensional since the dimension typically corresponds to the number of considered financial assets. A major issue is that most approximation methods for nonlinear PDEs in the literature suffer under the so-called curse of dimensionality in the sense that the computational effort to compute an approximation with a prescribed accuracy grows exponentially in the dimension of the PDE or in the reciprocal of the prescribed approximation accuracy and nearly all approximation methods have not been shown not to suffer under the curse of dimensionality. Recently, a new class of approximation schemes for semilinear parabolic PDEs, termed full history recursive multilevel Picard (MLP) algorithms, were introduced and it was proven that MLP algorithms do overcome the curse of dimensionality for semilinear heat equations. In this paper we extend those findings to a more general class of semilinear PDEs including as special cases semilinear Black-Scholes equations used for the pricing of financial derivatives with default risks. More specifically, we introduce an MLP algorithm for the approximation of solutions of semilinear Black-Scholes equations and prove that the computational effort of our method grows at most polynomially both in the dimension and the reciprocal of the prescribed approximation accuracy. This is, to the best of our knowledge, the first result showing that the approximation of solutions of semilinear Black-Scholes equations is a polynomially tractable approximation problem.
NAJan 29, 2018
Strong error analysis for stochastic gradient descent optimization algorithmsArnulf Jentzen, Benno Kuckuck, Ariel Neufeld et al.
Stochastic gradient descent (SGD) optimization algorithms are key ingredients in a series of machine learning applications. In this article we perform a rigorous strong error analysis for SGD optimization algorithms. In particular, we prove for every arbitrarily small $\varepsilon \in (0,\infty)$ and every arbitrarily large $p\in (0,\infty)$ that the considered SGD optimization algorithm converges in the strong $L^p$-sense with order $\frac{1}{2}-\varepsilon$ to the global minimum of the objective function of the considered stochastic approximation problem under standard convexity-type assumptions on the objective function and relaxed assumptions on the moments of the stochastic errors appearing in the employed SGD optimization algorithm. The key ideas in our convergence proof are, first, to employ techniques from the theory of Lyapunov-type functions for dynamical systems to develop a general convergence machinery for SGD optimization algorithms based on such functions, then, to apply this general machinery to concrete Lyapunov-type functions with polynomial structures, and, thereafter, to perform an induction argument along the powers appearing in the Lyapunov-type functions in order to achieve for every arbitrarily large $ p \in (0,\infty) $ strong $ L^p $-convergence rates. This article also contains an extensive review of results on SGD optimization algorithms in the scientific literature.
LGOct 31, 2023
Mathematical Introduction to Deep Learning: Methods, Implementations, and TheoryArnulf Jentzen, Benno Kuckuck, Philippe von Wurstemberger
This book aims to provide an introduction to the topic of deep learning algorithms. We review essential components of deep learning algorithms in full mathematical detail including different artificial neural network (ANN) architectures (such as fully-connected feedforward ANNs, convolutional ANNs, recurrent ANNs, residual ANNs, and ANNs with batch normalization) and different optimization algorithms (such as the basic stochastic gradient descent (SGD) method, accelerated methods, and adaptive methods). We also cover several theoretical aspects of deep learning algorithms such as approximation capacities of ANNs (including a calculus for ANNs), optimization theory (including Kurdyka-Łojasiewicz inequalities), and generalization errors. In the last part of the book some deep learning approximation methods for PDEs are reviewed including physics-informed neural networks (PINNs) and deep Galerkin methods. We hope that this book will be useful for students and scientists who do not yet have any background in deep learning at all and would like to gain a solid foundation as well as for practitioners who would like to obtain a firmer mathematical understanding of the objects and methods considered in deep learning.
NAApr 30
Algorithmically Designed Artificial Neural Networks (ADANNs): Higher order deep operator learning for parametric partial differential equationsArnulf Jentzen, Adrian Riekert, Philippe von Wurstemberger
In this article we propose a new deep learning approach to approximate operators related to parametric partial differential equations (PDEs). In particular, we introduce a new strategy to design specific artificial neural network (ANN) architectures in conjunction with specific ANN initialization schemes which are tailor-made for the particular approximation problem under consideration. In the proposed approach we combine efficient classical numerical approximation techniques with deep operator learning methodologies. Specifically, we introduce customized adaptions of existing ANN architectures together with specialized initializations for these ANN architectures so that at initialization we have that the ANNs closely mimic a chosen efficient classical numerical algorithm for the considered approximation problem. The obtained ANN architectures and their initialization schemes are thus strongly inspired by numerical algorithms as well as by popular deep learning methodologies from the literature and in that sense we refer to the introduced ANNs in conjunction with their tailor-made initialization schemes as Algorithmically Designed Artificial Neural Networks (ADANNs). We numerically test the proposed ADANN methodology in the case of several parametric PDEs. In the tested numerical examples the ADANN methodology significantly outperforms existing classical approximation algorithms as well as existing deep operator learning methodologies from the literature.
OCNov 10, 2025
Adam symmetry theorem: characterization of the convergence of the stochastic Adam optimizerSteffen Dereich, Thang Do, Arnulf Jentzen et al.
Beside the standard stochastic gradient descent (SGD) method, the Adam optimizer due to Kingma & Ba (2014) is currently probably the best-known optimization method for the training of deep neural networks in artificial intelligence (AI) systems. Despite the popularity and the success of Adam it remains an \emph{open research problem} to provide a rigorous convergence analysis for Adam even for the class of strongly convex SOPs. In one of the main results of this work we establish convergence rates for Adam in terms of the number of gradient steps (convergence rate \nicefrac{1}{2} w.r.t. the size of the learning rate), the size of the mini-batches (convergence rate 1 w.r.t. the size of the mini-batches), and the size of the second moment parameter of Adam (convergence rate 1 w.r.t. the distance of the second moment parameter to 1) for the class of strongly convex SOPs. In a further main result of this work, which we refer to as \emph{Adam symmetry theorem}, we illustrate the optimality of the established convergence rates by proving for a special class of simple quadratic strongly convex SOPs that Adam converges as the number of gradient steps increases to infinity to the solution of the SOP (the unique minimizer of the strongly convex objective function) if and \emph{only} if the random variables in the SOP (the data in the SOP) are \emph{symmetrically distributed}. In particular, in the standard case where the random variables in the SOP are not symmetrically distributed we \emph{disprove} that Adam converges to the minimizer of the SOP as the number of Adam steps increases to infinity. We also complement the conclusions of our convergence analysis and the Adam symmetry theorem by several numerical simulations that indicate the sharpness of the established convergence rates and that illustrate the practical appearance of the phenomena revealed in the \emph{Adam symmetry theorem}.
LGDec 2, 2024
An overview of diffusion models for generative artificial intelligenceDavide Gallon, Arnulf Jentzen, Philippe von Wurstemberger
This article provides a mathematically rigorous introduction to denoising diffusion probabilistic models (DDPMs), sometimes also referred to as diffusion probabilistic models or diffusion models, for generative artificial intelligence. We provide a detailed basic mathematical framework for DDPMs and explain the main ideas behind training and generation procedures. In this overview article we also review selected extensions and improvements of the basic framework from the literature such as improved DDPMs, denoising diffusion implicit models, classifier-free diffusion guidance models, and latent diffusion models.
NASep 7, 2018
A proof that artificial neural networks overcome the curse of dimensionality in the numerical approximation of Black-Scholes partial differential equationsPhilipp Grohs, Fabian Hornung, Arnulf Jentzen et al.
Artificial neural networks (ANNs) have very successfully been used in numerical simulations for a series of computational problems ranging from image classification/image recognition, speech recognition, time series analysis, game intelligence, and computational advertising to numerical approximations of partial differential equations (PDEs). Such numerical simulations suggest that ANNs have the capacity to very efficiently approximate high-dimensional functions and, especially, indicate that ANNs seem to admit the fundamental power to overcome the curse of dimensionality when approximating the high-dimensional functions appearing in the above named computational problems. There are a series of rigorous mathematical approximation results for ANNs in the scientific literature. Some of them prove convergence without convergence rates and some even rigorously establish convergence rates but there are only a few special cases where mathematical results can rigorously explain the empirical success of ANNs when approximating high-dimensional functions. The key contribution of this article is to disclose that ANNs can efficiently approximate high-dimensional functions in the case of numerical approximations of Black-Scholes PDEs. More precisely, this work reveals that the number of required parameters of an ANN to approximate the solution of the Black-Scholes PDE grows at most polynomially in both the reciprocal of the prescribed approximation accuracy $\varepsilon > 0$ and the PDE dimension $d \in \mathbb{N}$. We thereby prove, for the first time, that ANNs do indeed overcome the curse of dimensionality in the numerical approximation of Black-Scholes PDEs.
NAMar 22, 2018
Lower error bounds for the stochastic gradient descent optimization algorithm: Sharp convergence rates for slowly and fast decaying learning ratesArnulf Jentzen, Philippe von Wurstemberger
The stochastic gradient descent (SGD) optimization algorithm plays a central role in a series of machine learning applications. The scientific literature provides a vast amount of upper error bounds for the SGD method. Much less attention as been paid to proving lower error bounds for the SGD method. It is the key contribution of this paper to make a step in this direction. More precisely, in this article we establish for every $γ, ν\in (0,\infty)$ essentially matching lower and upper bounds for the mean square error of the SGD process with learning rates $(\fracγ{n^ν})_{n \in \mathbb{N}}$ associated to a simple quadratic stochastic optimization problem. This allows us to precisely quantify the mean square convergence rate of the SGD method in dependence on the asymptotic behavior of the learning rates.