PRJan 22, 2015
On Tamed Euler Approximations of SDEs Driven by Lévy Noise with Applications to Delay EquationsKonstantinos Dareiotis, Chaman Kumar, Sotirios Sabanis
We extend the taming techniques for explicit Euler approximations of stochastic differential equations (SDEs) driven by Lévy noise with super-linearly growing drift coefficients. Strong convergence results are presented for the case of locally Lipschitz coefficients. Moreover, rate of convergence results are obtained in agreement with classical literature when the local Lipschitz continuity assumptions are replaced by global and, in addition, the drift coefficients satisfy polynomial Lipschitz continuity. Finally, we further extend these techniques to the case of delay equations.
PRJan 11, 2016
On Milstein approximations with varying coefficients: the case of super-linear diffusion coefficientsChaman Kumar, Sotirios Sabanis
A new class of explicit Milstein schemes, which approximate stochastic differential equations (SDEs) with superlinearly growing drift and diffusion coefficients, is proposed in this article. It is shown, under very mild conditions, that these explicit schemes converge in $\mathcal L^p$ to the solution of the corresponding SDEs with optimal rate.
PRNov 10, 2016
On Explicit Approximations for Lévy Driven SDEs with Super-linear Diffusion CoefficientsChaman Kumar, Sotirios Sabanis
Motivated by the results of \cite{sabanis2015}, we propose explicit Euler-type schemes for SDEs with random coefficients driven by Lévy noise when the drift and diffusion coefficients can grow super-linearly. As an application of our results, one can construct explicit Euler-type schemes for SDEs with delays (SDDEs) which are driven by Lévy noise and have super-linear coefficients. Strong convergence results are established and their rate of convergence is shown to be equal to that of the classical Euler scheme. It is proved that the optimal rate of convergence is achieved for $\mathcal{L}^2$-convergence which is consistent with the corresponding results available in the literature.
LGNov 22, 2023
On diffusion-based generative models and their error bounds: The log-concave case with full convergence estimatesStefano Bruno, Ying Zhang, Dong-Young Lim et al.
We provide full theoretical guarantees for the convergence behaviour of diffusion-based generative models under the assumption of strongly log-concave data distributions while our approximating class of functions used for score estimation is made of Lipschitz continuous functions avoiding any Lipschitzness assumption on the score function. We demonstrate via a motivating example, sampling from a Gaussian distribution with unknown mean, the powerfulness of our approach. In this case, explicit estimates are provided for the associated optimization problem, i.e. score approximation, while these are combined with the corresponding sampling estimates. As a result, we obtain the best known upper bound estimates in terms of key quantities of interest, such as the dimension and rates of convergence, for the Wasserstein-2 distance between the data distribution (Gaussian with unknown mean) and our sampling algorithm. Beyond the motivating example and in order to allow for the use of a diverse range of stochastic optimizers, we present our results using an $L^2$-accurate score estimation assumption, which crucially is formed under an expectation with respect to the stochastic optimizer and our novel auxiliary process that uses only known information. This approach yields the best known convergence rate for our sampling algorithm.
PRJan 19, 2023
Kinetic Langevin MCMC Sampling Without Gradient Lipschitz Continuity -- the Strongly Convex CaseTim Johnston, Iosif Lytras, Sotirios Sabanis
In this article we consider sampling from log concave distributions in Hamiltonian setting, without assuming that the objective gradient is globally Lipschitz. We propose two algorithms based on monotone polygonal (tamed) Euler schemes, to sample from a target measure, and provide non-asymptotic 2-Wasserstein distance bounds between the law of the process of each algorithm and the target measure. Finally, we apply these results to bound the excess risk optimization error of the associated optimization problem.
OCOct 24, 2022
Langevin dynamics based algorithm e-TH$\varepsilon$O POULA for stochastic optimization problems with discontinuous stochastic gradientDong-Young Lim, Ariel Neufeld, Sotirios Sabanis et al.
We introduce a new Langevin dynamics based algorithm, called e-TH$\varepsilon$O POULA, to solve optimization problems with discontinuous stochastic gradients which naturally appear in real-world applications such as quantile estimation, vector quantization, CVaR minimization, and regularized optimization problems involving ReLU neural networks. We demonstrate both theoretically and numerically the applicability of the e-TH$\varepsilon$O POULA algorithm. More precisely, under the conditions that the stochastic gradient is locally Lipschitz in average and satisfies a certain convexity at infinity condition, we establish non-asymptotic error bounds for e-TH$\varepsilon$O POULA in Wasserstein distances and provide a non-asymptotic estimate for the expected excess risk, which can be controlled to be arbitrarily small. Three key applications in finance and insurance are provided, namely, multi-period portfolio optimization, transfer learning in multi-period portfolio optimization, and insurance claim prediction, which involve neural networks with (Leaky)-ReLU activation functions. Numerical experiments conducted using real-world datasets illustrate the superior empirical performance of e-TH$\varepsilon$O POULA compared to SGLD, TUSLA, ADAM, and AMSGrad in terms of model accuracy.
LGMay 6, 2025
Wasserstein Convergence of Score-based Generative Models under Semiconvexity and Discontinuous GradientsStefano Bruno, Sotirios Sabanis
Score-based Generative Models (SGMs) approximate a data distribution by perturbing it with Gaussian noise and subsequently denoising it via a learned reverse diffusion process. These models excel at modeling complex data distributions and generating diverse samples, achieving state-of-the-art performance across domains such as computer vision, audio generation, reinforcement learning, and computational biology. Despite their empirical success, existing Wasserstein-2 convergence analysis typically assume strong regularity conditions-such as smoothness or strict log-concavity of the data distribution-that are rarely satisfied in practice. In this work, we establish the first non-asymptotic Wasserstein-2 convergence guarantees for SGMs targeting semiconvex distributions with potentially discontinuous gradients. Our upper bounds are explicit and sharp in key parameters, achieving optimal dependence of $O(\sqrt{d})$ on the data dimension $d$ and convergence rate of order one. The framework accommodates a wide class of practically relevant distributions, including symmetric modified half-normal distributions, Gaussian mixtures, double-well potentials, and elastic net potentials. By leveraging semiconvexity without requiring smoothness assumptions on the potential such as differentiability, our results substantially broaden the theoretical foundations of SGMs, bridging the gap between empirical success and rigorous guarantees in non-smooth, complex data regimes.
CESep 15, 2025
FinGEAR: Financial Mapping-Guided Enhanced Answer RetrievalYing Li, Mengyu Wang, Miguel de Carvalho et al.
Financial disclosures such as 10-K filings present challenging retrieval problems due to their length, regulatory section hierarchy, and domain-specific language, which standard retrieval-augmented generation (RAG) models underuse. We introduce FinGEAR (Financial Mapping-Guided Enhanced Answer Retrieval), a retrieval framework tailored to financial documents. FinGEAR combines a finance lexicon for Item-level guidance (FLAM), dual hierarchical indices for within-Item search (Summary Tree and Question Tree), and a two-stage cross-encoder reranker. This design aligns retrieval with disclosure structure and terminology, enabling fine-grained, query-aware context selection. Evaluated on full 10-Ks with queries aligned to the FinQA dataset, FinGEAR delivers consistent gains in precision, recall, F1, and relevancy, improving F1 by up to 56.7% over flat RAG, 12.5% over graph-based RAGs, and 217.6% over prior tree-based systems, while also increasing downstream answer accuracy with a fixed reader. By jointly modeling section hierarchy and domain lexicon signals, FinGEAR improves retrieval fidelity and provides a practical foundation for high-stakes financial analysis.
LGOct 2, 2025
Flatness-Aware Stochastic Gradient Langevin DynamicsStefano Bruno, Youngsik Hwang, Jaehyeon An et al.
Generalization in deep learning is closely tied to the pursuit of flat minima in the loss landscape, yet classical Stochastic Gradient Langevin Dynamics (SGLD) offers no mechanism to bias its dynamics toward such low-curvature solutions. This work introduces Flatness-Aware Stochastic Gradient Langevin Dynamics (fSGLD), designed to efficiently and provably seek flat minima in high-dimensional nonconvex optimization problems. At each iteration, fSGLD uses the stochastic gradient evaluated at parameters perturbed by isotropic Gaussian noise, commonly referred to as Random Weight Perturbation (RWP), thereby optimizing a randomized-smoothing objective that implicitly captures curvature information. Leveraging these properties, we prove that the invariant measure of fSGLD stays close to a stationary measure concentrated on the global minimizers of a loss function regularized by the Hessian trace whenever the inverse temperature and the scale of random weight perturbation are properly coupled. This result provides a rigorous theoretical explanation for the benefits of random weight perturbation. In particular, we establish non-asymptotic convergence guarantees in Wasserstein distance with the best known rate and derive an excess-risk bound for the Hessian-trace regularized objective. Extensive experiments on noisy-label and large-scale vision tasks, in both training-from-scratch and fine-tuning settings, demonstrate that fSGLD achieves superior or comparable generalization and robustness to baseline algorithms while maintaining the computational cost of SGD, about half that of SAM. Hessian-spectrum analysis further confirms that fSGLD converges to significantly flatter minima.
CLOct 1, 2025
One More Question is Enough, Expert Question Decomposition (EQD) Model for Domain Quantitative ReasoningMengyu Wang, Sotirios Sabanis, Miguel de Carvalho et al.
Domain-specific quantitative reasoning remains a major challenge for large language models (LLMs), especially in fields requiring expert knowledge and complex question answering (QA). In this work, we propose Expert Question Decomposition (EQD), an approach designed to balance the use of domain knowledge with computational efficiency. EQD is built on a two-step fine-tuning framework and guided by a reward function that measures the effectiveness of generated sub-questions in improving QA outcomes. It requires only a few thousand training examples and a single A100 GPU for fine-tuning, with inference time comparable to zero-shot prompting. Beyond its efficiency, EQD outperforms state-of-the-art domain-tuned models and advanced prompting strategies. We evaluate EQD in the financial domain, characterized by specialized knowledge and complex quantitative reasoning, across four benchmark datasets. Our method consistently improves QA performance by 0.6% to 10.5% across different LLMs. Our analysis reveals an important insight: in domain-specific QA, a single supporting question often provides greater benefit than detailed guidance steps.
STJun 5, 2025
kTULA: A Langevin sampling algorithm with improved KL bounds under super-linear log-gradientsIosif Lytras, Sotirios Sabanis, Ying Zhang
Motivated by applications in deep learning, where the global Lipschitz continuity condition is often not satisfied, we examine the problem of sampling from distributions with super-linearly growing log-gradients. We propose a novel tamed Langevin dynamics-based algorithm, called kTULA, to solve the aforementioned sampling problem, and provide a theoretical guarantee for its performance. More precisely, we establish a non-asymptotic convergence bound in Kullback-Leibler (KL) divergence with the best-known rate of convergence equal to $2-\overlineε$, $\overlineε>0$, which significantly improves relevant results in existing literature. This enables us to obtain an improved non-asymptotic error bound in Wasserstein-2 distance, which can be used to further derive a non-asymptotic guarantee for kTULA to solve the associated optimization problems. To illustrate the applicability of kTULA, we apply the proposed algorithm to the problem of sampling from a high-dimensional double-well potential distribution and to an optimization problem involving a neural network. We show that our main results can be used to provide theoretical guarantees for the performance of kTULA.
COOct 21, 2021
Statistical Finite Elements via Langevin DynamicsÖmer Deniz Akyildiz, Connor Duffin, Sotirios Sabanis et al.
The recent statistical finite element method (statFEM) provides a coherent statistical framework to synthesise finite element models with observed data. Through embedding uncertainty inside of the governing equations, finite element solutions are updated to give a posterior distribution which quantifies all sources of uncertainty associated with the model. However to incorporate all sources of uncertainty, one must integrate over the uncertainty associated with the model parameters, the known forward problem of uncertainty quantification. In this paper, we make use of Langevin dynamics to solve the statFEM forward problem, studying the utility of the unadjusted Langevin algorithm (ULA), a Metropolis-free Markov chain Monte Carlo sampler, to build a sample-based characterisation of this otherwise intractable measure. Due to the structure of the statFEM problem, these methods are able to solve the forward problem without explicit full PDE solves, requiring only sparse matrix-vector products. ULA is also gradient-based, and hence provides a scalable approach up to high degrees-of-freedom. Leveraging the theory behind Langevin-based samplers, we provide theoretical guarantees on sampler performance, demonstrating convergence, for both the prior and posterior, in the Kullback-Leibler divergence, and, in Wasserstein-2, with further results on the effect of preconditioning. Numerical experiments are also provided, for both the prior and posterior, to demonstrate the efficacy of the sampler, with a Python package also included.
OCJul 19, 2021
Non-asymptotic estimates for TUSLA algorithm for non-convex learning with applications to neural networks with ReLU activation functionDong-Young Lim, Ariel Neufeld, Sotirios Sabanis et al.
We consider non-convex stochastic optimization problems where the objective functions have super-linearly growing and discontinuous stochastic gradients. In such a setting, we provide a non-asymptotic analysis for the tamed unadjusted stochastic Langevin algorithm (TUSLA) introduced in Lovas et al. (2020). In particular, we establish non-asymptotic error bounds for the TUSLA algorithm in Wasserstein-1 and Wasserstein-2 distances. The latter result enables us to further derive non-asymptotic estimates for the expected excess risk. To illustrate the applicability of the main results, we consider an example from transfer learning with ReLU neural networks, which represents a key paradigm in machine learning. Numerical experiments are presented for the aforementioned example which support our theoretical findings. Hence, in this setting, we demonstrate both theoretically and numerically that the TUSLA algorithm can solve the optimization problem involving neural networks with ReLU activation function. Besides, we provide simulation results for synthetic examples where popular algorithms, e.g. ADAM, AMSGrad, RMSProp, and (vanilla) stochastic gradient descent (SGD) algorithm, may fail to find the minimizer of the objective functions due to the super-linear growth and the discontinuity of the corresponding stochastic gradient, while the TUSLA algorithm converges rapidly to the optimal solution. Moreover, we provide an empirical comparison of the performance of TUSLA with popular stochastic optimizers on real-world datasets, as well as investigate the effect of the key hyperparameters of TUSLA on its performance.
LGMay 28, 2021
Polygonal Unadjusted Langevin Algorithms: Creating stable and efficient adaptive algorithms for neural networksDong-Young Lim, Sotirios Sabanis
We present a new class of Langevin based algorithms, which overcomes many of the known shortcomings of popular adaptive optimizers that are currently used for the fine tuning of deep learning models. Its underpinning theory relies on recent advances of Euler's polygonal approximations for stochastic differential equations (SDEs) with monotone coefficients. As a result, it inherits the stability properties of tamed algorithms, while it addresses other known issues, e.g. vanishing gradients in neural networks. In particular, we provide a nonasymptotic analysis and full theoretical guarantees for the convergence properties of an algorithm of this novel class, which we named TH$\varepsilon$O POULA (or, simply, TheoPouLa). Finally, several experiments are presented with different types of deep learning models, which show the superior performance of TheoPouLa over many popular adaptive optimization algorithms.
LGJun 25, 2020
Taming neural networks with TUSLA: Non-convex learning via adaptive stochastic gradient Langevin algorithmsAttila Lovas, Iosif Lytras, Miklós Rásonyi et al.
Artificial neural networks (ANNs) are typically highly nonlinear systems which are finely tuned via the optimization of their associated, non-convex loss functions. In many cases, the gradient of any such loss function has superlinear growth, making the use of the widely-accepted (stochastic) gradient descent methods, which are based on Euler numerical schemes, problematic. We offer a new learning algorithm based on an appropriately constructed variant of the popular stochastic gradient Langevin dynamics (SGLD), which is called tamed unadjusted stochastic Langevin algorithm (TUSLA). We also provide a nonasymptotic analysis of the new algorithm's convergence properties in the context of non-convex learning problems with the use of ANNs. Thus, we provide finite-time guarantees for TUSLA to find approximate minimizers of both empirical and population risks. The roots of the TUSLA algorithm are based on the taming technology for diffusion processes with superlinear coefficients as developed in \citet{tamed-euler, SabanisAoAP} and for MCMC algorithms in \citet{tula}. Numerical experiments are presented which confirm the theoretical findings and illustrate the need for the use of the new algorithm in comparison to vanilla SGLD within the framework of ANNs.
OCFeb 13, 2020
Nonasymptotic analysis of Stochastic Gradient Hamiltonian Monte Carlo under local conditions for nonconvex optimizationÖmer Deniz Akyildiz, Sotirios Sabanis
We provide a nonasymptotic analysis of the convergence of the stochastic gradient Hamiltonian Monte Carlo (SGHMC) to a target measure in Wasserstein-2 distance without assuming log-concavity. Our analysis quantifies key theoretical properties of the SGHMC as a sampler under local conditions which significantly improves the findings of previous results. In particular, we prove that the Wasserstein-2 distance between the target and the law of the SGHMC is uniformly controlled by the step-size of the algorithm, therefore demonstrate that the SGHMC can provide high-precision results uniformly in the number of iterations. The analysis also allows us to obtain nonasymptotic bounds for nonconvex optimization problems under local conditions and implies that the SGHMC, when viewed as a nonconvex optimizer, converges to a global minimum with the best known rates. We apply our results to obtain nonasymptotic bounds for scalable Bayesian inference and nonasymptotic generalization bounds.
STOct 4, 2019
Nonasymptotic estimates for Stochastic Gradient Langevin Dynamics under local conditions in nonconvex optimizationYing Zhang, Ömer Deniz Akyildiz, Theodoros Damoulas et al.
In this paper, we are concerned with a non-asymptotic analysis of sampling algorithms used in nonconvex optimization. In particular, we obtain non-asymptotic estimates in Wasserstein-1 and Wasserstein-2 distances for a popular class of algorithms called Stochastic Gradient Langevin Dynamics (SGLD). In addition, the aforementioned Wasserstein-2 convergence result can be applied to establish a non-asymptotic error bound for the expected excess risk. Crucially, these results are obtained under a local Lipschitz condition and a local dissipativity condition where we remove the uniform dependence in the data stream. We illustrate the importance of this relaxation by presenting examples from variational inference and from index tracking optimization.
STMay 30, 2019
On stochastic gradient Langevin dynamics with dependent data streams: the fully non-convex caseNgoc Huy Chau, Éric Moulines, Miklos Rásonyi et al.
We consider the problem of sampling from a target distribution, which is \emph {not necessarily logconcave}, in the context of empirical risk minimization and stochastic optimization as presented in Raginsky et al. (2017). Non-asymptotic analysis results are established in the $L^1$-Wasserstein distance for the behaviour of Stochastic Gradient Langevin Dynamics (SGLD) algorithms. We allow the estimation of gradients to be performed even in the presence of \emph{dependent} data streams. Our convergence estimates are sharper and \emph{uniform} in the number of iterations, in contrast to those in previous studies.
PRSep 13, 2018
On explicit order 1.5 approximations with varying coefficients: the case of super-linear diffusion coefficientsSotirios Sabanis, Ying Zhang
A conjecture appears in \cite{milsteinscheme}, in the form of a remark, where it is stated that it is possible to construct, in a specified way, any high order explicit numerical schemes to approximate the solutions of SDEs with superlinear coefficients. We answer this conjecture affirmatively for the case of order 1.5 approximations and show that the suggested methodology works. Moreover, we explore the case of having Hölder continuous derivatives for the diffusion coefficients.
PRSep 2, 2016
Euler approximations with varying coefficients: The case of superlinearly growing diffusion coefficientsSotirios Sabanis
A new class of explicit Euler schemes, which approximate stochastic differential equations (SDEs) with superlinearly growing drift and diffusion coefficients, is proposed in this article. It is shown, under very mild conditions, that these explicit schemes converge in probability and in $\mathcal{L}^p$ to the solution of the corresponding SDEs. Moreover, rate of convergence estimates are provided for $\mathcal{L}^p$ and almost sure convergence. In particular, the strong order $1/2$ is recovered in the case of uniform $\mathcal{L}^p$-convergence.
PRAug 13, 2015
Convergence of tamed Euler schemes for a class of stochastic evolution equationsIstván Gyöngy, Sotirios Sabanis, David Šiška
We prove stability and convergence of a full discretization for a class of stochastic evolution equations with super-linearly growing operators appearing in the drift term. This is done using the recently developed tamed Euler method, which uses a fully explicit time stepping, coupled with a Galerkin scheme for the spatial discretization.