LG AI ROFeb 8, 2024

Offline Actor-Critic Reinforcement Learning Scales to Large Models

Jost Tobias Springenberg, Abbas Abdolmaleki, Jingwei Zhang, Oliver Groth, Michael Bloesch, Thomas Lampe, Philemon Brakel, Sarah Bechtle, Steven Kapturowski, Roland Hafner, Nicolas Heess, Martin Riedmiller

arXiv:2402.05546v125.438 citationsh-index: 57ICML

Originality Incremental advance

AI Analysis

This enables learning multi-task policies from sub-optimal data, advancing offline RL for real robotics and AI applications, though it is incremental in improving existing methods.

The paper tackles the problem of scaling offline actor-critic reinforcement learning to large models like transformers, showing it outperforms behavioral cloning baselines on 132 continuous control tasks with multi-task training.

We show that offline actor-critic reinforcement learning can scale to large models - such as transformers - and follows similar scaling laws as supervised learning. We find that offline actor-critic algorithms can outperform strong, supervised, behavioral cloning baselines for multi-task training on a large dataset containing both sub-optimal and expert behavior on 132 continuous control tasks. We introduce a Perceiver-based actor-critic model and elucidate the key model features needed to make offline RL work with self- and cross-attention modules. Overall, we find that: i) simple offline actor critic algorithms are a natural choice for gradually moving away from the currently predominant paradigm of behavioral cloning, and ii) via offline RL it is possible to learn multi-task policies that master many domains simultaneously, including real robotics tasks, from sub-optimal demonstrations or self-generated data.

View on arXiv PDF

Similar