Physics-Informed Policy Optimization via Analytic Dynamics Regularization
This addresses the problem of inefficient and unstable robotic control for researchers and practitioners in robotics and RL, representing a new paradigm rather than an incremental improvement.
The paper tackles the problem of high sample complexity and physically inconsistent actions in reinforcement learning for robotic control by introducing PIPER, a physics-informed RL framework that integrates physical constraints into policy optimization, resulting in significant improvements in learning efficiency, stability, and control accuracy.
Reinforcement learning (RL) has achieved strong performance in robotic control; however, state-of-the-art policy learning methods, such as actor-critic methods, still suffer from high sample complexity and often produce physically inconsistent actions. This limitation stems from neural policies implicitly rediscovering complex physics from data alone, despite accurate dynamics models being readily available in simulators. In this paper, we introduce a novel physics-informed RL framework, called PIPER, that seamlessly integrates physical constraints directly into neural policy optimization with analytical soft physics constraints. At the core of our method is the integration of a differentiable Lagrangian residual as a regularization term within the actor's objective. This residual, extracted from a robot's simulator description, subtly biases policy updates towards dynamically consistent solutions. Crucially, this physics integration is realized through an additional loss term during policy optimization, requiring no alterations to existing simulators or core RL algorithms. Extensive experiments demonstrate that our method significantly improves learning efficiency, stability, and control accuracy, establishing a new paradigm for efficient and physically consistent robotic control.