The Value of Planning for Infinite-Horizon Model Predictive Control
This addresses the brittleness and computational cost of replanning in robotics control, though it is incremental as it builds on existing MPC and planning methods.
The paper tackles the problem of myopic decisions in Model Predictive Control (MPC) due to short prediction horizons by proposing a method to use intermediate data structures from planners as an approximate value function, resulting in more efficient and resilient behavior for goal-directed robotics tasks like reaching and navigation.
Model Predictive Control (MPC) is a classic tool for optimal control of complex, real-world systems. Although it has been successfully applied to a wide range of challenging tasks in robotics, it is fundamentally limited by the prediction horizon, which, if too short, will result in myopic decisions. Recently, several papers have suggested using a learned value function as the terminal cost for MPC. If the value function is accurate, it effectively allows MPC to reason over an infinite horizon. Unfortunately, Reinforcement Learning (RL) solutions to value function approximation can be difficult to realize for robotics tasks. In this paper, we suggest a more efficient method for value function approximation that applies to goal-directed problems, like reaching and navigation. In these problems, MPC is often formulated to track a path or trajectory returned by a planner. However, this strategy is brittle in that unexpected perturbations to the robot will require replanning, which can be costly at runtime. Instead, we show how the intermediate data structures used by modern planners can be interpreted as an approximate value function. We show that that this value function can be used by MPC directly, resulting in more efficient and resilient behavior at runtime.