Arash Bahari Kordabad

LG
h-index5
6papers
51citations
Novelty51%
AI Score47

6 Papers

20.2SYJun 3
Multi-Agent Temporal Logic Planning via Penalty Functions and Block-Coordinate Optimization

Eleftherios E. Vlahakis, Arash Bahari Kordabad, Lars Lindemann et al.

Multi-agent planning under Signal Temporal Logic (STL) is often hindered by collaborative tasks that lead to computational challenges due to the inherent high dimensionality of the problem, preventing scalable synthesis with satisfaction guarantees. To address this, we formulate STL planning as an optimization program under multi-agent STL constraints and introduce a penalty-based unconstrained relaxation that can be efficiently solved via a Block-Coordinate Gradient Descent (BCGD) method, where each block corresponds to a single agent's decision variables, thereby mitigating complexity. By utilizing a quadratic penalty function defined via smooth STL semantics, we show that BCGD iterations converge to a stationary point of the penalized problem under standard regularity assumptions. To enforce feasibility, the BCGD solver is embedded within a two-layer optimization scheme: inner BCGD updates are performed for a fixed penalty parameter, which is then increased in an outer loop to progressively improve multi-agent STL robustness. The proposed framework enables scalable computations and is validated through various complex multi-robot planning scenarios.

LGMar 25, 2022
Quasi-Newton Iteration in Deterministic Policy Gradient

Arash Bahari Kordabad, Hossein Nejatbakhsh Esfahani, Wenqi Cai et al.

This paper presents a model-free approximation for the Hessian of the performance of deterministic policies to use in the context of Reinforcement Learning based on Quasi-Newton steps in the policy parameters. We show that the approximate Hessian converges to the exact Hessian at the optimal policy, and allows for a superlinear convergence in the learning, provided that the policy parametrization is rich. The natural policy gradient method can be interpreted as a particular case of the proposed method. We analytically verify the formulation in a simple linear case and compare the convergence of the proposed method with the natural policy gradient in a nonlinear example.

LGNov 12, 2025
Quasi-Newton Compatible Actor-Critic for Deterministic Policies

Arash Bahari Kordabad, Dean Brandner, Sebastien Gros et al.

In this paper, we propose a second-order deterministic actor-critic framework in reinforcement learning that extends the classical deterministic policy gradient method to exploit curvature information of the performance function. Building on the concept of compatible function approximation for the critic, we introduce a quadratic critic that simultaneously preserves the true policy gradient and an approximation of the performance Hessian. A least-squares temporal difference learning scheme is then developed to estimate the quadratic critic parameters efficiently. This construction enables a quasi-Newton actor update using information learned by the critic, yielding faster convergence compared to first-order methods. The proposed approach is general and applicable to any differentiable policy class. Numerical examples demonstrate that the method achieves improved convergence and performance over standard deterministic actor-critic baselines.

18.0SYMay 5
Almost Sure Reachability in Continuous-time Stochastic Systems

Arash Bahari Kordabad, Rupak Majumdar, Sadegh Soudjani

We provide certificates for almost sure reachability of continuous-time stochastic systems governed by stochastic differential equations (SDEs). We first show that a standard Euler-Maruyama discretization may fail to preserve almost sure reachability property of the system using a double-well Langevin system. This observation motivates us to develop certificates for almost sure reachability directly on the continuous-time system. We introduce a pair of certificates, a drift function and a variant function, and prove necessity and sufficiency for almost sure reachability of an open bounded target set. Using these certificates, for linear SDEs, we give a characterization of almost sure reachability in terms of the spectral structure of the system matrices. For polynomial SDEs, we fix a polynomial template for the drift function and choose the variant function template as an exponential function composed with a polynomial. This allows us to translate the conditions in the certificates into sum-of-squares (SOS) constraints. We then propose an alternating scheme to resolve bilinearities. We illustrate the approach on the double-well Langevin example, showing that continuous-time SOS certificates recover almost sure reachability that is lost under time discretization. Moreover, we verify the SOS approach on a polynomial system.

SYApr 6, 2021
Approximate Robust NMPC using Reinforcement Learning

Hossein Nejatbakhsh Esfahani, Arash Bahari Kordabad, Sebastien Gros

We present a Reinforcement Learning-based Robust Nonlinear Model Predictive Control (RL-RNMPC) framework for controlling nonlinear systems in the presence of disturbances and uncertainties. An approximate Robust Nonlinear Model Predictive Control (RNMPC) of low computational complexity is used in which the state trajectory uncertainty is modelled via ellipsoids. Reinforcement Learning is then used in order to handle the ellipsoidal approximation and improve the closed-loop performance of the scheme by adjusting the MPC parameters generating the ellipsoids. The approach is tested on a simulated Wheeled Mobile Robot (WMR) tracking a desired trajectory while avoiding static obstacles.

LGApr 6, 2021
MPC-based Reinforcement Learning for Economic Problems with Application to Battery Storage

Arash Bahari Kordabad, Wenqi Cai, Sebastien Gros

In this paper, we are interested in optimal control problems with purely economic costs, which often yield optimal policies having a (nearly) bang-bang structure. We focus on policy approximations based on Model Predictive Control (MPC) and the use of the deterministic policy gradient method to optimize the MPC closed-loop performance in the presence of unmodelled stochasticity or model error. When the policy has a (nearly) bang-bang structure, we observe that the policy gradient method can struggle to produce meaningful steps in the policy parameters. To tackle this issue, we propose a homotopy strategy based on the interior-point method, providing a relaxation of the policy during the learning. We investigate a specific well-known battery storage problem, and show that the proposed method delivers a homogeneous and faster learning than a classical policy gradient approach.