LGSYApr 1, 2025

MPCritic: A plug-and-play MPC architecture for reinforcement learning

arXiv:2504.01086v22 citationsh-index: 10CDC
Originality Incremental advance
AI Analysis

This addresses the problem of computational and software integration barriers for researchers and practitioners in control and RL, though it appears incremental as it builds on existing synergies without claiming broad SOTA.

The paper tackles the challenge of integrating reinforcement learning (RL) and model predictive control (MPC) by proposing MPCritic, a plug-and-play architecture that avoids costly optimization and enables seamless use of MPC tools, demonstrating its versatility on classic control benchmarks.

The reinforcement learning (RL) and model predictive control (MPC) communities have developed vast ecosystems of theoretical approaches and computational tools for solving optimal control problems. Given their conceptual similarities but differing strengths, there has been increasing interest in synergizing RL and MPC. However, existing approaches tend to be limited for various reasons, including computational cost of MPC in an RL algorithm and software hurdles towards seamless integration of MPC and RL tools. These challenges often result in the use of "simple" MPC schemes or RL algorithms, neglecting the state-of-the-art in both areas. This paper presents MPCritic, a machine learning-friendly architecture that interfaces seamlessly with MPC tools. MPCritic utilizes the loss landscape defined by a parameterized MPC problem, focusing on "soft" optimization over batched training steps; thereby updating the MPC parameters while avoiding costly minimization and parametric sensitivities. Since the MPC structure is preserved during training, an MPC agent can be readily used for online deployment, where robust constraint satisfaction is paramount. We demonstrate the versatility of MPCritic, in terms of MPC architectures and RL algorithms that it can accommodate, on classic control benchmarks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes