OCMay 31, 2019
Optimal Control of Fractional Elliptic PDEs with State Constraints and Characterization of the dual of Fractional Order Sobolev SpacesHarbir Antil, Deepanshu Verma, Mahamadi Warma
This paper introduces the notion of state constraints for optimal control problems governed by fractional elliptic PDEs of order $s \in (0,1)$. There are several mathematical tools that are developed during the process to study this problem, for instance, the characterization of the dual of the fractional order Sobolev spaces and well-posedness of fractional PDEs with measure-valued datum. These tools are widely applicable. We show well-posedness of the optimal control problem and derive the first order optimality conditions. Notice that the adjoint equation is a fractional PDE with measure as the right-hand-side datum. We use the characterization of the fractional order dual spaces to study the regularity of the state and adjoint equations. We emphasize that the classical case ($s=1$) was considered by E. Casas in \cite{ECasas_1986a} but almost none of the existing results are applicable to our fractional case.
MLOct 25, 2023
Efficient Neural Network Approaches for Conditional Optimal Transport with Applications in Bayesian InferenceZheyu Oliver Wang, Ricardo Baptista, Youssef Marzouk et al.
We present two neural network approaches that approximate the solutions of static and dynamic $\unicode{x1D450}\unicode{x1D45C}\unicode{x1D45B}\unicode{x1D451}\unicode{x1D456}\unicode{x1D461}\unicode{x1D456}\unicode{x1D45C}\unicode{x1D45B}\unicode{x1D44E}\unicode{x1D459}\unicode{x0020}\unicode{x1D45C}\unicode{x1D45D}\unicode{x1D461}\unicode{x1D456}\unicode{x1D45A}\unicode{x1D44E}\unicode{x1D459}\unicode{x0020}\unicode{x1D461}\unicode{x1D45F}\unicode{x1D44E}\unicode{x1D45B}\unicode{x1D460}\unicode{x1D45D}\unicode{x1D45C}\unicode{x1D45F}\unicode{x1D461}$ (COT) problems. Both approaches enable conditional sampling and conditional density estimation, which are core tasks in Bayesian inference$\unicode{x2013}$particularly in the simulation-based ($\unicode{x201C}$likelihood-free$\unicode{x201D}$) setting. Our methods represent the target conditional distribution as a transformation of a tractable reference distribution. Obtaining such a transformation, chosen here to be an approximation of the COT map, is computationally challenging even in moderate dimensions. To improve scalability, our numerical algorithms use neural networks to parameterize candidate maps and further exploit the structure of the COT problem. Our static approach approximates the map as the gradient of a partially input-convex neural network. It uses a novel numerical implementation to increase computational efficiency compared to state-of-the-art alternatives. Our dynamic approach approximates the conditional optimal transport via the flow map of a regularized neural ODE; compared to the static approach, it is slower to train but offers more modeling choices and can lead to faster sampling. We demonstrate both algorithms numerically, comparing them with competing state-of-the-art approaches, using benchmark datasets and simulation-based Bayesian inverse problems.
94.8OCApr 25
On the Convergence of Jacobian-Free Backpropagation for Optimal Control Problems with Implicit HamiltoniansEric Gelphman, Deepanshu Verma, Nicole Tianjiao Yang et al.
Optimal feedback control with implicit Hamiltonians poses a fundamental challenge for learning-based value function methods due to the absence of closed-form optimal control laws. Recent work~\cite{gelphman2025end} introduced an implicit deep learning approach using Jacobian-Free Backpropagation (JFB) to address this setting, but only established sample-wise descent guarantees. In this paper, we establish convergence guarantees for JFB in the stochastic minibatch setting, showing that the resulting updates converge to stationary points of the expected optimal control objective. We further demonstrate scalability on substantially higher-dimensional problems, including multi-agent optimal consumption and swarm-based quadrotor and bicycle control. Together, our results provide both theoretical justification and empirical evidence for using JFB in high-dimensional optimal control with implicit Hamiltonians.
40.6LGMar 24
Boost Like a (Var)Pro: Trust-Region Gradient Boosting via Variable ProjectionAbhijit Chowdhary, Elizabeth Newman, Deepanshu Verma
Gradient boosting, a method of building additive ensembles from weak learners, has established itself as a practical and theoretically-motivated approach to approximate functions, especially using decision tree weak learners. Comparable methods for smooth parametric learners, such as neural networks, remain less developed in both training methodology and theory. To this end, we introduce \texttt{VPBoost} ({\bf V}ariable {\bf P}rojection {\bf Boost}ing), a gradient boosting algorithm for separable smooth approximators, i.e., models with a smooth nonlinear featurizer followed by a final linear mapping. \texttt{VPBoost} fuses variable projection, a training paradigm for separable models that enforces optimality of the linear weights, with a second-order weak learning strategy. The combination of second-order boosting, separable models, and variable projection give rise to a closed-form solution for the optimal linear weights and a natural interpretation of \VPBoost as a functional trust-region method. We thereby leverage trust-region theory to prove \VPBoost converges to a stationary point under mild geometric conditions and, under stronger assumptions, achieves a superlinear convergence rate. Comprehensive numerical experiments on synthetic data, image recognition, and scientific machine learning benchmarks demonstrate that \VPBoost learns an ensemble with improved evaluation metrics in comparison to gradient-descent-based boosting and attains competitive performance relative to an industry-standard decision tree boosting algorithm.
OCNov 13, 2023
Learning Control Policies of Hodgkin-Huxley Neuronal DynamicsMalvern Madondo, Deepanshu Verma, Lars Ruthotto et al.
We present a neural network approach for closed-loop deep brain stimulation (DBS). We cast the problem of finding an optimal neurostimulation strategy as a control problem. In this setting, control policies aim to optimize therapeutic outcomes by tailoring the parameters of a DBS system, typically via electrical stimulation, in real time based on the patient's ongoing neuronal activity. We approximate the value function offline using a neural network to enable generating controls (stimuli) in real time via the feedback form. The neuronal activity is characterized by a nonlinear, stiff system of differential equations as dictated by the Hodgkin-Huxley model. Our training process leverages the relationship between Pontryagin's maximum principle and Hamilton-Jacobi-Bellman equations to update the value function estimates simultaneously. Our numerical experiments illustrate the accuracy of our approach for out-of-distribution samples and the robustness to moderate shocks and disturbances in the system.
OCSep 22, 2025
Zero-Shot Transferable Solution Method for Parametric Optimal Control ProblemsXingjian Li, Kelvin Kan, Deepanshu Verma et al.
This paper presents a transferable solution method for optimal control problems with varying objectives using function encoder (FE) policies. Traditional optimization-based approaches must be re-solved whenever objectives change, resulting in prohibitive computational costs for applications requiring frequent evaluation and adaptation. The proposed method learns a reusable set of neural basis functions that spans the control policy space, enabling efficient zero-shot adaptation to new tasks through either projection from data or direct mapping from problem specifications. The key idea is an offline-online decomposition: basis functions are learned once during offline imitation learning, while online adaptation requires only lightweight coefficient estimation. Numerical experiments across diverse dynamics, dimensions, and cost structures show our method delivers near-optimal performance with minimal overhead when generalizing across tasks, enabling semi-global feedback policies suitable for real-time deployment.
LGOct 1, 2025
Randomized Matrix Sketching for Neural Network Training and Gradient MonitoringHarbir Antil, Deepanshu Verma
Neural network training relies on gradient computation through backpropagation, yet memory requirements for storing layer activations present significant scalability challenges. We present the first adaptation of control-theoretic matrix sketching to neural network layer activations, enabling memory-efficient gradient reconstruction in backpropagation. This work builds on recent matrix sketching frameworks for dynamic optimization problems, where similar state trajectory storage challenges motivate sketching techniques. Our approach sketches layer activations using three complementary sketch matrices maintained through exponential moving averages (EMA) with adaptive rank adjustment, automatically balancing memory efficiency against approximation quality. Empirical evaluation on MNIST, CIFAR-10, and physics-informed neural networks demonstrates a controllable accuracy-memory tradeoff. We demonstrate a gradient monitoring application on MNIST showing how sketched activations enable real-time gradient norm tracking with minimal memory overhead. These results establish that sketched activation storage provides a viable path toward memory-efficient neural network training and analysis.
OCOct 1, 2025
End-to-End Training of High-Dimensional Optimal Control with Implicit Hamiltonians via Jacobian-Free BackpropagationEric Gelphman, Deepanshu Verma, Nicole Tianjiao Yang et al.
Neural network approaches that parameterize value functions have succeeded in approximating high-dimensional optimal feedback controllers when the Hamiltonian admits explicit formulas. However, many practical problems, such as the space shuttle reentry problem and bicycle dynamics, among others, may involve implicit Hamiltonians that do not admit explicit formulas, limiting the applicability of existing methods. Rather than directly parameterizing controls, which does not leverage the Hamiltonian's underlying structure, we propose an end-to-end implicit deep learning approach that directly parameterizes the value function to learn optimal control laws. Our method enforces physical principles by ensuring trained networks adhere to the control laws by exploiting the fundamental relationship between the optimal control and the value function's gradient; this is a direct consequence of the connection between Pontryagin's Maximum Principle and dynamic programming. Using Jacobian-Free Backpropagation (JFB), we achieve efficient training despite temporal coupling in trajectory optimization. We show that JFB produces descent directions for the optimal control objective and experimentally demonstrate that our approach effectively learns high-dimensional feedback controllers across multiple scenarios involving implicit Hamiltonians, which existing methods cannot address.
LGSep 24, 2025
Latent TwinsMatthias Chung, Deepanshu Verma, Max Collins et al.
Over the past decade, scientific machine learning has transformed the development of mathematical and computational frameworks for analyzing, modeling, and predicting complex systems. From inverse problems to numerical PDEs, dynamical systems, and model reduction, these advances have pushed the boundaries of what can be simulated. Yet they have often progressed in parallel, with representation learning and algorithmic solution methods evolving largely as separate pipelines. With \emph{Latent Twins}, we propose a unifying mathematical framework that creates a hidden surrogate in latent space for the underlying equations. Whereas digital twins mirror physical systems in the digital world, Latent Twins mirror mathematical systems in a learned latent space governed by operators. Through this lens, classical modeling, inversion, model reduction, and operator approximation all emerge as special cases of a single principle. We establish the fundamental approximation properties of Latent Twins for both ODEs and PDEs and demonstrate the framework across three representative settings: (i) canonical ODEs, capturing diverse dynamical regimes; (ii) a PDE benchmark using the shallow-water equations, contrasting Latent Twin simulations with DeepONet and forecasts with a 4D-Var baseline; and (iii) a challenging real-data geopotential reanalysis dataset, reconstructing and forecasting from sparse, noisy observations. Latent Twins provide a compact, interpretable surrogate for solution operators that evaluate across arbitrary time gaps in a single-shot, while remaining compatible with scientific pipelines such as assimilation, control, and uncertainty quantification. Looking forward, this framework offers scalable, theory-grounded surrogates that bridge data-driven representation learning and classical scientific modeling across disciplines.
LGApr 1, 2021
Novel DNNs for Stiff ODEs with Applications to Chemically Reacting FlowsThomas S. Brown, Harbir Antil, Rainald Löhner et al.
Chemically reacting flows are common in engineering, such as hypersonic flow, combustion, explosions, manufacturing processes and environmental assessments. For combustion, the number of reactions can be significant (over 100) and due to the very large CPU requirements of chemical reactions (over 99%) a large number of flow and combustion problems are presently beyond the capabilities of even the largest supercomputers. Motivated by this, novel Deep Neural Networks (DNNs) are introduced to approximate stiff ODEs. Two approaches are compared, i.e., either learn the solution or the derivative of the solution to these ODEs. These DNNs are applied to multiple species and reactions common in chemically reacting flows. Experimental results show that it is helpful to account for the physical properties of species while designing DNNs. The proposed approach is shown to generalize well.
NAFeb 8, 2021
Novel Deep neural networks for solving Bayesian statistical inverseHarbir Antil, Howard C Elman, Akwum Onwunta et al.
We consider the simulation of Bayesian statistical inverse problems governed by large-scale linear and nonlinear partial differential equations (PDEs). Markov chain Monte Carlo (MCMC) algorithms are standard techniques to solve such problems. However, MCMC techniques are computationally challenging as they require several thousands of forward PDE solves. The goal of this paper is to introduce a fractional deep neural network based approach for the forward solves within an MCMC routine. Moreover, we discuss some approximation error estimates and illustrate the efficiency of our approach via several numerical examples.
OCApr 1, 2020
Fractional Deep Neural Network via Constrained OptimizationHarbir Antil, Ratna Khatri, Rainald Löhner et al.
This paper introduces a novel algorithmic framework for a deep neural network (DNN), which in a mathematically rigorous manner, allows us to incorporate history (or memory) into the network -- it ensures all layers are connected to one another. This DNN, called Fractional-DNN, can be viewed as a time-discretization of a fractional in time nonlinear ordinary differential equation (ODE). The learning problem then is a minimization problem subject to that fractional ODE as constraints. We emphasize that an analogy between the existing DNN and ODEs, with standard time derivative, is well-known by now. The focus of our work is the Fractional-DNN. Using the Lagrangian approach, we provide a derivation of the backward propagation and the design equations. We test our network on several datasets for classification problems. Fractional-DNN offers various advantages over the existing DNN. The key benefits are a significant improvement to the vanishing gradient issue due to the memory effect, and better handling of nonsmooth data due to the network's ability to approximate non-smooth functions.
OCApr 11, 2019
External optimal control of fractional parabolic PDEsHarbir Antil, Deepanshu Verma, Mahamadi Warma
In this paper we introduce a new notion of optimal control, or source identification in inverse, problems with fractional parabolic PDEs as constraints. This new notion allows a source/control placement outside the domain where the PDE is fulfilled. We tackle the Dirichlet, the Neumann and the Robin cases. For the fractional elliptic PDEs this has been recently investigated by the authors in \cite{HAntil_RKhatri_MWarma_2018a}. The need for these novel optimal control concepts stems from the fact that the classical PDE models only allow placing the source/control either on the boundary or in the interior where the PDE is satisfied. However, the nonlocal behavior of the fractional operator now allows placing the control in the exterior. We introduce the notions of weak and very-weak solutions to the parabolic Dirichlet problem. We present an approach on how to approximate the parabolic Dirichlet solutions by the parabolic Robin solutions (with convergence rates). A complete analysis for the Dirichlet and Robin optimal control problems has been discussed. The numerical examples confirm our theoretical findings and further illustrate the potential benefits of nonlocal models over the local ones.