OCDec 16, 2017
Time-Optimal Collaborative Guidance Using the Generalized Hopf FormulaMatthew R. Kirchner, Robert Mar, Gary Hewer et al.
Presented is a new method for calculating the time-optimal guidance control for a multiple vehicle pursuit-evasion system. A joint differential game of k pursuing vehicles relative to the evader is constructed, and a Hamilton-Jacobi-Isaacs (HJI) equation that describes the evolution of the value function is formulated. The value function is built such that the terminal cost is the squared distance from the boundary of the terminal surface. Additionally, all vehicles are assumed to have bounded controls. Typically, a joint state space constructed in this way would have too large a dimension to be solved with existing grid-based approaches. The value function is computed efficiently in high-dimensional space, without a discrete grid, using the generalized Hopf formula. The optimal time-to-reach is iteratively solved, and the optimal control is inferred from the gradient of the value function.
LGNov 13, 2023
Leveraging Hamilton-Jacobi PDEs with time-dependent Hamiltonians for continual scientific machine learningPaula Chen, Tingwei Meng, Zongren Zou et al.
We address two major challenges in scientific machine learning (SciML): interpretability and computational efficiency. We increase the interpretability of certain learning processes by establishing a new theoretical connection between optimization problems arising from SciML and a generalized Hopf formula, which represents the viscosity solution to a Hamilton-Jacobi partial differential equation (HJ PDE) with time-dependent Hamiltonian. Namely, we show that when we solve certain regularized learning problems with integral-type losses, we actually solve an optimal control problem and its associated HJ PDE with time-dependent Hamiltonian. This connection allows us to reinterpret incremental updates to learned models as the evolution of an associated HJ PDE and optimal control problem in time, where all of the previous information is intrinsically encoded in the solution to the HJ PDE. As a result, existing HJ PDE solvers and optimal control algorithms can be reused to design new efficient training approaches for SciML that naturally coincide with the continual learning framework, while avoiding catastrophic forgetting. As a first exploration of this connection, we consider the special case of linear regression and leverage our connection to develop a new Riccati-based methodology for solving these learning problems that is amenable to continual learning applications. We also provide some corresponding numerical examples that demonstrate the potential computational and memory advantages our Riccati-based approach can provide.
LGMar 22, 2023
Leveraging Multi-time Hamilton-Jacobi PDEs for Certain Scientific Machine Learning ProblemsPaula Chen, Tingwei Meng, Zongren Zou et al.
Hamilton-Jacobi partial differential equations (HJ PDEs) have deep connections with a wide range of fields, including optimal control, differential games, and imaging sciences. By considering the time variable to be a higher dimensional quantity, HJ PDEs can be extended to the multi-time case. In this paper, we establish a novel theoretical connection between specific optimization problems arising in machine learning and the multi-time Hopf formula, which corresponds to a representation of the solution to certain multi-time HJ PDEs. Through this connection, we increase the interpretability of the training process of certain machine learning applications by showing that when we solve these learning problems, we also solve a multi-time HJ PDE and, by extension, its corresponding optimal control problem. As a first exploration of this connection, we develop the relation between the regularized linear regression problem and the Linear Quadratic Regulator (LQR). We then leverage our theoretical connection to adapt standard LQR solvers (namely, those based on the Riccati ordinary differential equations) to design new training approaches for machine learning. Finally, we provide some numerical examples that demonstrate the versatility and possible computational advantages of our Riccati-based approach in the context of continual learning, post-training calibration, transfer learning, and sparse dynamics identification.
LGSep 15, 2024
HJ-sampler: A Bayesian sampler for inverse problems of a stochastic process by leveraging Hamilton-Jacobi PDEs and score-based generative modelsTingwei Meng, Zongren Zou, Jérôme Darbon et al.
The interplay between stochastic processes and optimal control has been extensively explored in the literature. With the recent surge in the use of diffusion models, stochastic processes have increasingly been applied to sample generation. This paper builds on the log transform, known as the Cole-Hopf transform in Brownian motion contexts, and extends it within a more abstract framework that includes a linear operator. Within this framework, we found that the well-known relationship between the Cole-Hopf transform and optimal transport is a particular instance where the linear operator acts as the infinitesimal generator of a stochastic process. We also introduce a novel scenario where the linear operator is the adjoint of the generator, linking to Bayesian inference under specific initial and terminal conditions. Leveraging this theoretical foundation, we develop a new algorithm, named the HJ-sampler, for Bayesian inference for the inverse problem of a stochastic differential equation with given terminal observations. The HJ-sampler involves two stages: (1) solving the viscous Hamilton-Jacobi partial differential equations, and (2) sampling from the associated stochastic optimal control problem. Our proposed algorithm naturally allows for flexibility in selecting the numerical solver for viscous HJ PDEs. We introduce two variants of the solver: the Riccati-HJ-sampler, based on the Riccati method, and the SGM-HJ-sampler, which utilizes diffusion models. We demonstrate the effectiveness and flexibility of the proposed methods by applying them to solve Bayesian inverse problems involving various stochastic processes and prior distributions, including applications that address model misspecifications and quantifying model uncertainty.
LGJan 22, 2025
Optimizing the Optimizer for Physics-Informed Neural Networks and Kolmogorov-Arnold NetworksElham Kiyani, Khemraj Shukla, Jorge F. Urbán et al.
Physics-Informed Neural Networks (PINNs) have revolutionized the computation of PDE solutions by integrating partial differential equations (PDEs) into the neural network's training process as soft constraints, becoming an important component of the scientific machine learning (SciML) ecosystem. More recently, physics-informed Kolmogorv-Arnold networks (PIKANs) have also shown to be effective and comparable in accuracy with PINNs. In their current implementation, both PINNs and PIKANs are mainly optimized using first-order methods like Adam, as well as quasi-Newton methods such as BFGS and its low-memory variant, L-BFGS. However, these optimizers often struggle with highly non-linear and non-convex loss landscapes, leading to challenges such as slow convergence, local minima entrapment, and (non)degenerate saddle points. In this study, we investigate the performance of Self-Scaled BFGS (SSBFGS), Self-Scaled Broyden (SSBroyden) methods and other advanced quasi-Newton schemes, including BFGS and L-BFGS with different line search strategies. These methods dynamically rescale updates based on historical gradient information, thus enhancing training efficiency and accuracy. We systematically compare these optimizers using both PINNs and PIKANs on key challenging PDEs, including the Burgers, Allen-Cahn, Kuramoto-Sivashinsky, Ginzburg-Landau, and Stokes equations. Additionally, we evaluate the performance of SSBFGS and SSBroyden for Deep Operator Network (DeepONet) architectures, demonstrating their effectiveness for data-driven operator learning. Our findings provide state-of-the-art results with orders-of-magnitude accuracy improvements without the use of adaptive weights or any other enhancements typically employed in PINNs.
LGApr 12, 2024
Leveraging viscous Hamilton-Jacobi PDEs for uncertainty quantification in scientific machine learningZongren Zou, Tingwei Meng, Paula Chen et al.
Uncertainty quantification (UQ) in scientific machine learning (SciML) combines the powerful predictive power of SciML with methods for quantifying the reliability of the learned models. However, two major challenges remain: limited interpretability and expensive training procedures. We provide a new interpretation for UQ problems by establishing a new theoretical connection between some Bayesian inference problems arising in SciML and viscous Hamilton-Jacobi partial differential equations (HJ PDEs). Namely, we show that the posterior mean and covariance can be recovered from the spatial gradient and Hessian of the solution to a viscous HJ PDE. As a first exploration of this connection, we specialize to Bayesian inference problems with linear models, Gaussian likelihoods, and Gaussian priors. In this case, the associated viscous HJ PDEs can be solved using Riccati ODEs, and we develop a new Riccati-based methodology that provides computational advantages when continuously updating the model predictions. Specifically, our Riccati-based approach can efficiently add or remove data points to the training set invariant to the order of the data and continuously tune hyperparameters. Moreover, neither update requires retraining on or access to previously incorporated data. We provide several examples from SciML involving noisy data and \textit{epistemic uncertainty} to illustrate the potential advantages of our approach. In particular, this approach's amenability to data streaming applications demonstrates its potential for real-time inferences, which, in turn, allows for applications in which the predicted uncertainty is used to dynamically alter the learning process.
MLMar 11, 2024
Efficient first-order algorithms for large-scale, non-smooth maximum entropy models with application to wildfire scienceGabriel P. Langlois, Jatan Buch, Jérôme Darbon
Maximum entropy (Maxent) models are a class of statistical models that use the maximum entropy principle to estimate probability distributions from data. Due to the size of modern data sets, Maxent models need efficient optimization algorithms to scale well for big data applications. State-of-the-art algorithms for Maxent models, however, were not originally designed to handle big data sets; these algorithms either rely on technical devices that may yield unreliable numerical results, scale poorly, or require smoothness assumptions that many practical Maxent models lack. In this paper, we present novel optimization algorithms that overcome the shortcomings of state-of-the-art algorithms for training large-scale, non-smooth Maxent models. Our proposed first-order algorithms leverage the Kullback-Leibler divergence to train large-scale and non-smooth Maxent models efficiently. For Maxent models with discrete probability distribution of $n$ elements built from samples, each containing $m$ features, the stepsize parameters estimation and iterations in our algorithms scale on the order of $O(mn)$ operations and can be trivially parallelized. Moreover, the strong $\ell_{1}$ convexity of the Kullback--Leibler divergence allows for larger stepsize parameters, thereby speeding up the convergence rate of our algorithms. To illustrate the efficiency of our novel algorithms, we consider the problem of estimating probabilities of fire occurrences as a function of ecological features in the Western US MTBS-Interagency wildfire data set. Our numerical results show that our algorithms outperform the state of the arts by one order of magnitude and yield results that agree with physical models of wildfire occurrence and previous statistical analyses of wildfire drivers.
LGSep 17, 2025
A Variational Framework for Residual-Based Adaptivity in Neural PDE Solvers and Operator LearningJuan Diego Toscano, Daniel T. Chen, Vivek Oommen et al.
Residual-based adaptive strategies are widely used in scientific machine learning but remain largely heuristic. We introduce a unifying variational framework that formalizes these methods by integrating convex transformations of the residual. Different transformations correspond to distinct objective functionals: exponential weights target the minimization of uniform error, while linear weights recover the minimization of quadratic error. Within this perspective, adaptive weighting is equivalent to selecting sampling distributions that optimize the primal objective, thereby linking discretization choices directly to error metrics. This principled approach yields three benefits: (1) it enables systematic design of adaptive schemes across norms, (2) reduces discretization error through variance reduction of the loss estimator, and (3) enhances learning dynamics by improving the gradient signal-to-noise ratio. Extending the framework to operator learning, we demonstrate substantial performance gains across optimizers and architectures. Our results provide a theoretical justification of residual-based adaptivity and establish a foundation for principled discretization and training strategies.
OCJul 8, 2025
A fast algorithm for solving the lasso problem exactly without homotopy using differential inclusionsGabriel P. Langlois, Jérôme Darbon
We prove in this work that the well-known lasso problem can be solved exactly without homotopy using novel differential inclusions techniques. Specifically, we show that a selection principle from the theory of differential inclusions transforms the dual lasso problem into the problem of calculating the trajectory of a projected dynamical system that we prove is integrable. Our analysis yields an exact algorithm for the lasso problem, numerically up to machine precision, that is amenable to computing regularization paths and is very fast. Moreover, we show the continuation of solutions to the integrable projected dynamical system in terms of the hyperparameter naturally yields a rigorous homotopy algorithm. Numerical experiments confirm that our algorithm outperforms the state-of-the-art algorithms in both efficiency and accuracy. Beyond this work, we expect our results and analysis can be adapted to compute exact or approximate solutions to a broader class of polyhedral-constrained optimization problems.
OCJan 14, 2022
SympOCnet: Solving optimal control problems with applications to high-dimensional multi-agent path planning problemsTingwei Meng, Zhen Zhang, Jérôme Darbon et al.
Solving high-dimensional optimal control problems in real-time is an important but challenging problem, with applications to multi-agent path planning problems, which have drawn increased attention given the growing popularity of drones in recent years. In this paper, we propose a novel neural network method called SympOCnet that applies the Symplectic network to solve high-dimensional optimal control problems with state constraints. We present several numerical results on path planning problems in two-dimensional and three-dimensional spaces. Specifically, we demonstrate that our SympOCnet can solve a problem with more than 500 dimensions in 1.5 hours on a single GPU, which shows the effectiveness and efficiency of SympOCnet. The proposed method is scalable and has the potential to solve truly high-dimensional path planning problems in real-time.
OCNov 30, 2021
Efficient and robust high-dimensional sparse logistic regression via nonlinear primal-dual hybrid gradient algorithmsJérôme Darbon, Gabriel P. Langlois
Logistic regression is a widely used statistical model to describe the relationship between a binary response variable and predictor variables in data sets. It is often used in machine learning to identify important predictor variables. This task, variable selection, typically amounts to fitting a logistic regression model regularized by a convex combination of $\ell_1$ and $\ell_{2}^{2}$ penalties. Since modern big data sets can contain hundreds of thousands to billions of predictor variables, variable selection methods depend on efficient and robust optimization algorithms to perform well. State-of-the-art algorithms for variable selection, however, were not traditionally designed to handle big data sets; they either scale poorly in size or are prone to produce unreliable numerical results. It therefore remains challenging to perform variable selection on big data sets without access to adequate and costly computational resources. In this paper, we propose a nonlinear primal-dual algorithm that addresses these shortcomings. Specifically, we propose an iterative algorithm that provably computes a solution to a logistic regression problem regularized by an elastic net penalty in $O(T(m,n)\log(1/ε))$ operations, where $ε\in (0,1)$ denotes the tolerance and $T(m,n)$ denotes the number of arithmetic operations required to perform matrix-vector multiplication on a data set with $m$ samples each comprising $n$ features. This result improves on the known complexity bound of $O(\min(m^2n,mn^2)\log(1/ε))$ for first-order optimization methods such as the classic primal-dual hybrid gradient or forward-backward splitting methods.
OCSep 24, 2021
Accelerated nonlinear primal-dual hybrid gradient methods with applications to supervised machine learningJérôme Darbon, Gabriel P. Langlois
The linear primal-dual hybrid gradient (PDHG) method is a first-order method that splits convex optimization problems with saddle-point structure into smaller subproblems. Unlike those obtained in most splitting methods, these subproblems can generally be solved efficiently because they involve simple operations such as matrix-vector multiplications or proximal mappings that are fast to evaluate numerically. This advantage comes at the price that the linear PDHG method requires precise stepsize parameters for the problem at hand to achieve an optimal convergence rate. Unfortunately, these stepsize parameters are often prohibitively expensive to compute for large-scale optimization problems, such as those in machine learning. This issue makes the otherwise simple linear PDHG method unsuitable for such problems, and it is also shared by most first-order optimization methods as well. To address this issue, we introduce accelerated nonlinear PDHG methods that achieve an optimal convergence rate with stepsize parameters that are simple and efficient to compute. We prove rigorous convergence results, including results for strongly convex or smooth problems posed on infinite-dimensional reflexive Banach spaces. We illustrate the efficiency of our methods on $\ell_{1}$-constrained logistic regression and entropy-regularized matrix games. Our numerical experiments show that the nonlinear PDHG methods are considerably faster than competing methods.
OCMay 28, 2021
On Hamilton-Jacobi PDEs and image denoising models with certain non-additive noiseJérôme Darbon, Tingwei Meng, Elena Resmerita
We consider image denoising problems formulated as variational problems. It is known that Hamilton-Jacobi PDEs govern the solution of such optimization problems when the noise model is additive. In this work, we address certain non-additive noise models and show that they are also related to Hamilton-Jacobi PDEs. These findings allow us to establish new connections between additive and non-additive noise imaging models. Specifically, we study how the solutions to these optimization problems depend on the parameters and the observed images. We show that the optimal values are ruled by some Hamilton-Jacobi PDEs, while the optimizers are characterized by the spatial gradient of the solution to the Hamilton-Jacobi PDEs. Moreover, we use these relations to investigate the asymptotic behavior of the variational model as the parameter goes to infinity, that is, when the influence of the noise vanishes. With these connections, some non-convex models for non-additive noise can be solved by applying convex optimization algorithms to the equivalent convex models for additive noise. Several numerical results are provided for denoising problems with Poisson noise or multiplicative noise.
OCMay 7, 2021
Neural network architectures using min-plus algebra for solving certain high dimensional optimal control problems and Hamilton-Jacobi PDEsJérôme Darbon, Peter M. Dower, Tingwei Meng
Solving high dimensional optimal control problems and corresponding Hamilton-Jacobi PDEs are important but challenging problems in control engineering. In this paper, we propose two abstract neural network architectures which are respectively used to compute the value function and the optimal control for certain class of high dimensional optimal control problems. We provide the mathematical analysis for the two abstract architectures. We also show several numerical results computed using the deep neural network implementations of these abstract architectures. A preliminary implementation of our proposed neural network architecture on FPGAs shows promising speed up compared to CPUs. This work paves the way to leverage efficient dedicated hardware designed for neural networks to solve high dimensional optimal control problems and Hamilton-Jacobi PDEs.
OCApr 22, 2021
Connecting Hamilton--Jacobi partial differential equations with maximum a posteriori and posterior mean estimators for some non-convex priorsJérôme Darbon, Gabriel P. Langlois, Tingwei Meng
Many imaging problems can be formulated as inverse problems expressed as finite-dimensional optimization problems. These optimization problems generally consist of minimizing the sum of a data fidelity and regularization terms. In [23,26], connections between these optimization problems and (multi-time) Hamilton--Jacobi partial differential equations have been proposed under the convexity assumptions of both the data fidelity and regularization terms. In particular, under these convexity assumptions, some representation formulas for a minimizer can be obtained. From a Bayesian perspective, such a minimizer can be seen as a maximum a posteriori estimator. In this chapter, we consider a certain class of non-convex regularizations and show that similar representation formulas for the minimizer can also be obtained. This is achieved by leveraging min-plus algebra techniques that have been originally developed for solving certain Hamilton--Jacobi partial differential equations arising in optimal control. Note that connections between viscous Hamilton--Jacobi partial differential equations and Bayesian posterior mean estimators with Gaussian data fidelity terms and log-concave priors have been highlighted in [25]. We also present similar results for certain Bayesian posterior mean estimators with Gaussian data fidelity and certain non-log-concave priors using an analogue of min-plus algebra techniques.
OCApr 6, 2021
A Caputo fractional derivative-based algorithm for optimizationYeonjong Shin, Jérôme Darbon, George Em Karniadakis
We propose a novel Caputo fractional derivative-based optimization algorithm. Upon defining the Caputo fractional gradient with respect to the Cartesian coordinate, we present a generic Caputo fractional gradient descent (CFGD) method. We prove that the CFGD yields the steepest descent direction of a locally smoothed objective function. The generic CFGD requires three parameters to be specified, and a choice of the parameters yields a version of CFGD. We propose three versions -- non-adaptive, adaptive terminal and adaptive order. By focusing on quadratic objective functions, we provide a convergence analysis. We prove that the non-adaptive CFGD converges to a Tikhonov regularized solution. For the two adaptive versions, we derive error bounds, which show convergence to integer-order stationary point under some conditions. We derive an explicit formula of CFGD for quadratic functions. We computationally found that the adaptive terminal (AT) CFGD mitigates the dependence on the condition number in the rate of convergence and results in significant acceleration over gradient descent (GD). For non-quadratic functions, we develop an efficient implementation of CFGD using the Gauss-Jacobi quadrature, whose computational cost is approximately proportional to the number of the quadrature points and the cost of GD. Our numerical examples show that AT-CFGD results in acceleration over GD, even when a small number of the Gauss-Jacobi quadrature points (including a single point) is used.