Policy Optimization for Unknown Systems using Differentiable Model Predictive Control

Riccardo Zuliani, Efe C. Balta, John Lygeros

arXiv:2511.113087.8h-index: 15

Predicted impact top 80% in SY · last 90 daysOriginality Incremental advance

AI Analysis

For control and robotics practitioners, this work offers a method to improve MPC policy performance under model mismatch, but the improvement is incremental over existing hybrid approaches.

The paper addresses the challenge of model-based policy optimization under model uncertainty, particularly for MPC policies. It introduces a framework combining differentiable and zeroth-order optimization, achieving faster transient performance than data-driven methods while maintaining convergence guarantees, demonstrated on a 12-dimensional quadcopter control task.

Model-based policy optimization often struggles with inaccurate system dynamics models, leading to suboptimal closed-loop performance. This challenge is especially evident in Model Predictive Control (MPC) policies, which rely on the model for real-time trajectory planning and optimization. We introduce a novel policy optimization framework for MPC-based policies combining differentiable optimization with zeroth-order optimization. Our method combines model-based and model-free gradient estimation approaches, achieving faster transient performance compared to fully data-driven approaches while maintaining convergence guarantees, even under model uncertainty. We demonstrate the effectiveness of the proposed approach on a nonlinear control task involving a 12-dimensional quadcopter model.

View on arXiv PDF

Similar