Amirhossein Taghvaei

LG
h-index18
17papers
557citations
Novelty52%
AI Score45

17 Papers

OCDec 21, 2017
Kalman Filter and its Modern Extensions for the Continuous-time Nonlinear Filtering Problem

Amirhossein Taghvaei, Jana de Wiljes, Prashant G. Mehta et al.

This paper is concerned with the filtering problem in continuous-time. Three algorithmic solution approaches for this problem are reviewed: (i) the classical Kalman-Bucy filter which provides an exact solution for the linear Gaussian problem, (ii) the ensemble Kalman-Bucy filter (EnKBF) which is an approximate filter and represents an extension of the Kalman-Bucy filter to nonlinear problems, and (iii) the feedback particle filter (FPF) which represents an extension of the EnKBF and furthermore provides for an consistent solution in the general nonlinear, non-Gaussian case. The common feature of the three algorithms is the gain times error formula to implement the update step (to account for conditioning due to the observations) in the filter. In contrast to the commonly used sequential Monte Carlo methods, the EnKBF and FPF avoid the resampling of the particles in the importance sampling update step. Moreover, the feedback control structure provides for error correction potentially leading to smaller simulation variance and improved stability properties. The paper also discusses the issue of non-uniqueness of the filter update formula and formulates a novel approximation algorithm based on ideas from optimal transport and coupling of measures. Performance of this and other algorithms is illustrated for a numerical example.

OCSep 30, 2019
Diffusion map-based algorithm for Gain function approximation in the Feedback Particle Filter

Amirhossein Taghvaei, Prashant G. Mehta, Sean P. Meyn

Feedback particle filter (FPF) is a numerical algorithm to approximate the solution of the nonlinear filtering problem in continuous-time settings. In any numerical implementation of the FPF algorithm, the main challenge is to numerically approximate the so-called gain function. A numerical algorithm for gain function approximation is the subject of this paper. The exact gain function is the solution of a Poisson equation involving a probability-weighted Laplacian $Δ_ρ$. The numerical problem is to approximate this solution using {\em only} finitely many particles sampled from the probability distribution $ρ$. A diffusion map-based algorithm was proposed by the authors in a prior work to solve this problem. The algorithm is named as such because it involves, as an intermediate step, a diffusion map approximation of the exact semigroup $e^{Δ_ρ}$. The original contribution of this paper is to carry out a rigorous error analysis of the diffusion map-based algorithm. The error is shown to include two components: bias and variance. The bias results from the diffusion map approximation of the exact semigroup. The variance arises because of finite sample size. Scalings and upper bounds are derived for bias and variance. These bounds are then illustrated with numerical experiments that serve to emphasize the effects of problem dimension and sample size. The proposed algorithm is applied to two filtering examples and comparisons provided with the sequential importance resampling (SIR) particle filter.

NADec 16, 2016
Error Estimates for the Kernel Gain Function Approximation in the Feedback Particle Filter

Amirhossein Taghvaei, Prashant G. Mehta, Sean P. Meyn

This paper is concerned with the analysis of the kernel-based algorithm for gain function approximation in the feedback particle filter. The exact gain function is the solution of a Poisson equation involving a probability-weighted Laplacian. The kernel-based method -- introduced in our prior work -- allows one to approximate this solution using {\em only} particles sampled from the probability distribution. This paper describes new representations and algorithms based on the kernel-based method. Theory surrounding the approximation is improved and a novel formula for the gain function approximation is derived. A procedure for carrying out error analysis of the approximation is introduced. Certain asymptotic estimates for bias and variance are derived for the general nonlinear non-Gaussian case. Comparison with the constant gain function approximation is provided. The results are illustrated with the aid of some numerical experiments.

LGOct 21, 2023
Nonlinear Filtering with Brenier Optimal Transport Maps

Mohammad Al-Jarrah, Niyizhen Jin, Bamdad Hosseini et al.

This paper is concerned with the problem of nonlinear filtering, i.e., computing the conditional distribution of the state of a stochastic dynamical system given a history of noisy partial observations. Conventional sequential importance resampling (SIR) particle filters suffer from fundamental limitations, in scenarios involving degenerate likelihoods or high-dimensional states, due to the weight degeneracy issue. In this paper, we explore an alternative method, which is based on estimating the Brenier optimal transport (OT) map from the current prior distribution of the state to the posterior distribution at the next time step. Unlike SIR particle filters, the OT formulation does not require the analytical form of the likelihood. Moreover, it allows us to harness the approximation power of neural networks to model complex and multi-modal distributions and employ stochastic optimization algorithms to enhance scalability. Extensive numerical experiments are presented that compare the OT method to the SIR particle filter and the ensemble Kalman filter, evaluating the performance in terms of sample efficiency, high-dimensional scalability, and the ability to capture complex and multi-modal distributions.

SYApr 1
Causal Optimal Coupling for Gaussian Input-Output Distributional Data

Daran Xu, Amirhossein Taghvaei

We study the problem of identifying an optimal coupling between input-output distributional data generated by a causal dynamical system. The coupling is required to satisfy prescribed marginal distributions and a causality constraint reflecting the temporal structure of the system. We formulate this problem as a Schr"odinger Bridge, which seeks the coupling closest - in Kullback-Leibler divergence - to a given prior while enforcing both marginal and causality constraints. For the case of Gaussian marginals and general time-dependent quadratic cost functions, we derive a fully tractable characterization of the Sinkhorn iterations that converges to the optimal solution. Beyond its theoretical contribution, the proposed framework provides a principled foundation for applying causal optimal transport methods to system identification from distributional data.

LGApr 3
Conditional Sampling via Wasserstein Autoencoders and Triangular Transport

Mohammad Al-Jarrah, Michele Martino, Marcus Yim et al.

We present Conditional Wasserstein Autoencoders (CWAEs), a framework for conditional simulation that exploits low-dimensional structure in both the conditioned and the conditioning variables. The key idea is to modify a Wasserstein autoencoder to use a (block-) triangular decoder and impose an appropriate independence assumption on the latent variables. We show that the resulting model gives an autoencoder that can exploit low-dimensional structure while simultaneously the decoder can be used for conditional simulation. We explore various theoretical properties of CWAEs, including their connections to conditional optimal transport (OT) problems. We also present alternative formulations that lead to three architectural variants forming the foundation of our algorithms. We present a series of numerical experiments that demonstrate that our different CWAE variants achieve substantial reductions in approximation error relative to the low-rank ensemble Kalman filter (LREnKF), particularly in problems where the support of the conditional measures is truly low-dimensional.

OCMar 16, 2025
Fast filtering of non-Gaussian models using Amortized Optimal Transport Maps

Mohammad Al-Jarrah, Bamdad Hosseini, Amirhossein Taghvaei

In this paper, we present the amortized optimal transport filter (A-OTF) designed to mitigate the computational burden associated with the real-time training of optimal transport filters (OTFs). OTFs can perform accurate non-Gaussian Bayesian updates in the filtering procedure, but they require training at every time step, which makes them expensive. The proposed A-OTF framework exploits the similarity between OTF maps during an initial/offline training stage in order to reduce the cost of inference during online calculations. More precisely, we use clustering algorithms to select relevant subsets of pre-trained maps whose weighted average is used to compute the A-OTF model akin to a mixture of experts. A series of numerical experiments validate that A-OTF achieves substantial computational savings during online inference while preserving the inherent flexibility and accuracy of OTF.

LGDec 4, 2021
Variational Wasserstein gradient flow

Jiaojiao Fan, Qinsheng Zhang, Amirhossein Taghvaei et al.

Wasserstein gradient flow has emerged as a promising approach to solve optimization problems over the space of probability distributions. A recent trend is to use the well-known JKO scheme in combination with input convex neural networks to numerically implement the proximal step. The most challenging step, in this setup, is to evaluate functions involving density explicitly, such as entropy, in terms of samples. This paper builds on the recent works with a slight but crucial difference: we propose to utilize a variational formulation of the objective function formulated as maximization over a parametric class of functions. Theoretically, the proposed variational formulation allows the construction of gradient flows directly for empirical distributions with a well-defined and meaningful objective function. Computationally, this approach replaces the computationally expensive step in existing methods, to handle objective functions involving density, with inner loop updates that only require a small batch of samples and scale well with the dimension. The performance and scalability of the proposed method are illustrated with the aid of several numerical experiments involving high-dimensional synthetic and real datasets.

LGOct 2, 2020
Deep FPF: Gain function approximation in high-dimensional setting

S. Yagiz Olmez, Amirhossein Taghvaei, Prashant G. Mehta

In this paper, we present a novel approach to approximate the gain function of the feedback particle filter (FPF). The exact gain function is the solution of a Poisson equation involving a probability-weighted Laplacian. The numerical problem is to approximate the exact gain function using only finitely many particles sampled from the probability distribution. Inspired by the recent success of the deep learning methods, we represent the gain function as a gradient of the output of a neural network. Thereupon considering a certain variational formulation of the Poisson equation, an optimization problem is posed for learning the weights of the neural network. A stochastic gradient algorithm is described for this purpose. The proposed approach has two significant properties/advantages: (i) The stochastic optimization algorithm allows one to process, in parallel, only a batch of samples (particles) ensuring good scaling properties with the number of particles; (ii) The remarkable representation power of neural networks means that the algorithm is potentially applicable and useful to solve high-dimensional problems. We numerically establish these two properties and provide extensive comparison to the existing approaches.

LGJul 8, 2020
Scalable Computations of Wasserstein Barycenter via Input Convex Neural Networks

Jiaojiao Fan, Amirhossein Taghvaei, Yongxin Chen

Wasserstein Barycenter is a principled approach to represent the weighted mean of a given set of probability distributions, utilizing the geometry induced by optimal transport. In this work, we present a novel scalable algorithm to approximate the Wasserstein Barycenters aiming at high-dimensional applications in machine learning. Our proposed algorithm is based on the Kantorovich dual formulation of the Wasserstein-2 distance as well as a recent neural network architecture, input convex neural network, that is known to parametrize convex functions. The distinguishing features of our method are: i) it only requires samples from the marginal distributions; ii) unlike the existing approaches, it represents the Barycenter with a generative model and can thus generate infinite samples from the barycenter without querying the marginal distributions; iii) it works similar to Generative Adversarial Model in one marginal case. We demonstrate the efficacy of our algorithm by comparing it with the state-of-art methods in multiple experiments.

SYOct 5, 2019
An Optimal Transport Formulation of the Ensemble Kalman Filter

Amirhossein Taghvaei, Prashant G. Mehta

Controlled interacting particle systems such as the ensemble Kalman filter (EnKF) and the feedback particle filter (FPF) are numerical algorithms to approximate the solution of the nonlinear filtering problem in continuous time. The distinguishing feature of these algorithms is that the Bayesian update step is implemented using a feedback control law. It has been noted in the literature that the control law is not unique. This is the main problem addressed in this paper. To obtain a unique control law, the filtering problem is formulated here as an optimal transportation problem. An explicit formula for the (mean-field type) optimal control law is derived in the linear Gaussian setting. Comparisons are made with the control laws for different types of EnKF algorithms described in the literature. Via empirical approximation of the mean-field control law, a finite-$N$ controlled interacting particle algorithm is obtained. For this algorithm, the equations for empirical mean and covariance are derived and shown to be identical to the Kalman filter. This allows strong conclusions on convergence and error properties based on the classical filter stability theory for the Kalman filter. It is shown that, under certain technical conditions, the mean squared error (m.s.e.) converges to zero even with a finite number of particles. A detailed propagation of chaos analysis is carried out for the finite-$N$ algorithm. The analysis is used to prove weak convergence of the empirical distribution as $N\rightarrow\infty$. For a certain simplified filtering problem, analytical comparison of the m.s.e. with the importance sampling-based algorithms is described. The analysis helps explain the favorable scaling properties of the control-based algorithms reported in several numerical studies in recent literature.

LGAug 28, 2019
Optimal transport mapping via input convex neural networks

Ashok Vardhan Makkuva, Amirhossein Taghvaei, Sewoong Oh et al.

In this paper, we present a novel and principled approach to learn the optimal transport between two distributions, from samples. Guided by the optimal transport theory, we learn the optimal Kantorovich potential which induces the optimal transport map. This involves learning two convex functions, by solving a novel minimax optimization. Building upon recent advances in the field of input convex neural networks, we propose a new framework where the gradient of one convex function represents the optimal transport mapping. Numerical experiments confirm that we learn the optimal transport mapping. This approach ensures that the transport mapping we find is optimal independent of how we initialize the neural networks. Further, target distributions from a discontinuous support can be easily captured, as gradient of a convex function naturally models a {\em discontinuous} transport mapping.

OCFeb 19, 2019
2-Wasserstein Approximation via Restricted Convex Potentials with Application to Improved Training for GANs

Amirhossein Taghvaei, Amin Jalali

We provide a framework to approximate the 2-Wasserstein distance and the optimal transport map, amenable to efficient training as well as statistical and geometric analysis. With the quadratic cost and considering the Kantorovich dual form of the optimal transportation problem, the Brenier theorem states that the optimal potential function is convex and the optimal transport map is the gradient of the optimal potential function. Using this geometric structure, we restrict the optimization problem to different parametrized classes of convex functions and pay special attention to the class of input-convex neural networks. We analyze the statistical generalization and the discriminative power of the resulting approximate metric, and we prove a restricted moment-matching property for the approximate optimal map. Finally, we discuss a numerical algorithm to solve the restricted optimization problem and provide numerical experiments to illustrate and compare the proposed approach with the established regularization-based approaches. We further discuss practical implications of our proposal in a modular and interpretable design for GANs which connects the generator training with discriminator computations to allow for learning an overall composite generator.

LGJan 10, 2019
Accelerated Flow for Probability Distributions

Amirhossein Taghvaei, Prashant G. Mehta

This paper presents a methodology and numerical algorithms for constructing accelerated gradient flows on the space of probability distributions. In particular, we extend the recent variational formulation of accelerated gradient methods in (wibisono, et. al. 2016) from vector valued variables to probability distributions. The variational problem is modeled as a mean-field optimal control problem. The maximum principle of optimal control theory is used to derive Hamilton's equations for the optimal gradient flow. The Hamilton's equation are shown to achieve the accelerated form of density transport from any initial probability distribution to a target probability distribution. A quantitative estimate on the asymptotic convergence rate is provided based on a Lyapunov function construction, when the objective functional is displacement convex. Two numerical approximations are presented to implement the Hamilton's equations as a system of $N$ interacting particles. The continuous limit of the Nesterov's algorithm is shown to be a special case with $N=1$. The algorithm is illustrated with numerical examples.

PRSep 20, 2018
Error Analysis of the Stochastic Linear Feedback Particle Filter

Amirhossein Taghvaei, Prashant G. Mehta

This paper is concerned with the convergence and long-term stability analysis of the feedback particle filter (FPF) algorithm. The FPF is an interacting system of $N$ particles where the interaction is designed such that the empirical distribution of the particles approximates the posterior distribution. It is known that in the mean-field limit ($N=\infty$), the distribution of the particles is equal to the posterior distribution. However little is known about the convergence to the mean-field limit. In this paper, we consider the FPF algorithm for the linear Gaussian setting. In this setting, the algorithm is similar to the ensemble Kalman-Bucy filter algorithm. Although these algorithms have been numerically evaluated and widely used in applications, their convergence and long-term stability analysis remains an active area of research. In this paper, we show that, (i) the mean-field limit is well-defined with a unique strong solution; (ii) the mean-field process is stable with respect to the initial condition; (iii) we provide conditions such that the finite-$N$ system is long term stable and we obtain some mean-squared error estimates that are uniform in time.

OCSep 27, 2017
How regularization affects the critical points in linear networks

Amirhossein Taghvaei, Jin W. Kim, Prashant G. Mehta

This paper is concerned with the problem of representing and learning a linear transformation using a linear neural network. In recent years, there has been a growing interest in the study of such networks in part due to the successes of deep learning. The main question of this body of research and also of this paper pertains to the existence and optimality properties of the critical points of the mean-squared loss function. The primary concern here is the robustness of the critical points with regularization of the loss function. An optimal control model is introduced for this purpose and a learning algorithm (regularized form of backprop) derived for the same using the Hamilton's formulation of optimal control. The formulation is used to provide a complete characterization of the critical points in terms of the solutions of a nonlinear matrix-valued equation, referred to as the characteristic equation. Analytical and numerical tools from bifurcation theory are used to compute the critical points via the solutions of the characteristic equation. The main conclusion is that the critical point diagram can be fundamentally different even with arbitrary small amounts of regularization.

LGMay 20, 2016
Adversarial Delays in Online Strongly-Convex Optimization

Daniel Khashabi, Kent Quanrud, Amirhossein Taghvaei

We consider the problem of strongly-convex online optimization in presence of adversarial delays; in a T-iteration online game, the feedback of the player's query at time t is arbitrarily delayed by an adversary for d_t rounds and delivered before the game ends, at iteration t+d_t-1. Specifically for \algo{online-gradient-descent} algorithm we show it has a simple regret bound of \Oh{\sum_{t=1}^T \log (1+ \frac{d_t}{t})}. This gives a clear and simple bound without resorting any distributional and limiting assumptions on the delays. We further show how this result encompasses and generalizes several of the existing known results in the literature. Specifically it matches the celebrated logarithmic regret \Oh{\log T} when there are no delays (i.e. d_t = 1) and regret bound of \Oh{τ\log T} for constant delays d_t = τ.