OCSYSYSep 5, 2020

A Convex Optimization Approach to Dynamic Programming in Continuous State and Action Spaces

arXiv:1810.038478 citationsh-index: 19
AI Analysis

It offers a computationally efficient approach with theoretical guarantees for solving dynamic programs in continuous spaces, which is relevant for control and reinforcement learning practitioners.

The paper proposes a convex optimization-based method for dynamic programming in continuous state and action spaces, achieving uniform convergence of the approximate value function to the optimum for convex cases and providing a provable suboptimality bound for control-affine systems.

In this paper, a convex optimization-based method is proposed for numerically solving dynamic programs in continuous state and action spaces. The key idea is to approximate the output of the Bellman operator at a particular state by the optimal value of a convex program. The approximate Bellman operator has a computational advantage because it involves a convex optimization problem in the case of control-affine systems and convex costs. Using this feature, we propose a simple dynamic programming algorithm to evaluate the approximate value function at pre-specified grid points by solving convex optimization problems in each iteration. We show that the proposed method approximates the optimal value function with a uniform convergence property in the case of convex optimal value functions. We also propose an interpolation-free design method for a control policy, of which performance converges uniformly to the optimum as the grid resolution becomes finer. When a nonlinear control-affine system is considered, the convex optimization approach provides an approximate policy with a provable suboptimality bound. For general cases, the proposed convex formulation of dynamic programming operators can be modified as a nonconvex bi-level program, in which the inner problem is a linear program, without losing uniform convergence properties.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes