Policy Optimization for Unknown Systems using Differentiable Model Predictive Control
For control and robotics practitioners, this work offers a method to improve MPC policy performance under model mismatch, but the improvement is incremental over existing hybrid approaches.
The paper addresses the challenge of model-based policy optimization under model uncertainty, particularly for MPC policies. It introduces a framework combining differentiable and zeroth-order optimization, achieving faster transient performance than data-driven methods while maintaining convergence guarantees, demonstrated on a 12-dimensional quadcopter control task.
Model-based policy optimization often struggles with inaccurate system dynamics models, leading to suboptimal closed-loop performance. This challenge is especially evident in Model Predictive Control (MPC) policies, which rely on the model for real-time trajectory planning and optimization. We introduce a novel policy optimization framework for MPC-based policies combining differentiable optimization with zeroth-order optimization. Our method combines model-based and model-free gradient estimation approaches, achieving faster transient performance compared to fully data-driven approaches while maintaining convergence guarantees, even under model uncertainty. We demonstrate the effectiveness of the proposed approach on a nonlinear control task involving a 12-dimensional quadcopter model.