LGAIROFeb 8, 2024

Offline Actor-Critic Reinforcement Learning Scales to Large Models

arXiv:2402.05546v138 citationsh-index: 72ICML
Originality Incremental advance
AI Analysis

This enables learning multi-task policies from sub-optimal data, advancing offline RL for real robotics and AI applications, though it is incremental in improving existing methods.

The paper tackles the problem of scaling offline actor-critic reinforcement learning to large models like transformers, showing it outperforms behavioral cloning baselines on 132 continuous control tasks with multi-task training.

We show that offline actor-critic reinforcement learning can scale to large models - such as transformers - and follows similar scaling laws as supervised learning. We find that offline actor-critic algorithms can outperform strong, supervised, behavioral cloning baselines for multi-task training on a large dataset containing both sub-optimal and expert behavior on 132 continuous control tasks. We introduce a Perceiver-based actor-critic model and elucidate the key model features needed to make offline RL work with self- and cross-attention modules. Overall, we find that: i) simple offline actor critic algorithms are a natural choice for gradually moving away from the currently predominant paradigm of behavioral cloning, and ii) via offline RL it is possible to learn multi-task policies that master many domains simultaneously, including real robotics tasks, from sub-optimal demonstrations or self-generated data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes