ROLGNov 3, 2020

Differentiable Physics Models for Real-world Offline Model-based Reinforcement Learning

arXiv:2011.01734v144 citations
AI Analysis

This work addresses the challenge of data-efficient and reliable MBRL for robotics, offering a solution that improves policy viability in real-world settings, though it is incremental in extending physics models to nonholonomic systems.

The paper tackles the problem of offline model-based reinforcement learning (MBRL) by showing that physics-based models outperform black-box models in real-world tasks like ball-in-a-cup, achieving success with only 4 minutes of data, while black-box models fail due to physically impossible predictions.

A limitation of model-based reinforcement learning (MBRL) is the exploitation of errors in the learned models. Black-box models can fit complex dynamics with high fidelity, but their behavior is undefined outside of the data distribution.Physics-based models are better at extrapolating, due to the general validity of their informed structure, but underfit in the real world due to the presence of unmodeled phenomena. In this work, we demonstrate experimentally that for the offline model-based reinforcement learning setting, physics-based models can be beneficial compared to high-capacity function approximators if the mechanical structure is known. Physics-based models can learn to perform the ball in a cup (BiC) task on a physical manipulator using only 4 minutes of sampled data using offline MBRL. We find that black-box models consistently produce unviable policies for BiC as all predicted trajectories diverge to physically impossible state, despite having access to more data than the physics-based model. In addition, we generalize the approach of physics parameter identification from modeling holonomic multi-body systems to systems with nonholonomic dynamics using end-to-end automatic differentiation. Videos: https://sites.google.com/view/ball-in-a-cup-in-4-minutes/

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes