ROMar 3, 2021

Policy Decomposition: Approximate Optimal Control with Suboptimality Estimates

arXiv:2103.02716v13.0

Originality Highly original

AI Analysis

This work addresses the challenge of quantifying suboptimality in approximate optimal control for intractable systems, offering a potential alternative for robotics and control applications, though it is incremental due to the need to handle combinatorics.

The authors tackled the problem of approximating optimal control for complex dynamical systems by proposing policy decomposition, a method that provides explicit suboptimality estimates. They demonstrated that this approach yields control policies in a fraction of the time required for optimal control, with no notable sacrifice in performance, using examples like a cart-pole and N-link planar manipulators.

Numerically computing global policies to optimal control problems for complex dynamical systems is mostly intractable. In consequence, a number of approximation methods have been developed. However, none of the current methods can quantify by how much the resulting control underperforms the elusive globally optimal solution. Here we propose policy decomposition, an approximation method with explicit suboptimality estimates. Our method decomposes the optimal control problem into lower-dimensional subproblems, whose optimal solutions are recombined to build a control policy for the entire system. Many such combinations exist, and we introduce the value error and its LQR and DDP estimates to predict the suboptimality of possible combinations and prioritize the ones that minimize it. Using a cart-pole, a 3-link balancing biped and N-link planar manipulators as example systems, we find that the estimates correctly identify the best combinations, yielding control policies in a fraction of the time it takes to compute the optimal control without a notable sacrifice in closed-loop performance. While more research will be needed to find ways of dealing with the combinatorics of policy decomposition, the results suggest this method could be an effective alternative for approximating optimal control in intractable systems.

View on arXiv PDF

Similar