What Matters for Simulation to Online Reinforcement Learning on Real Robots
This work addresses the challenge of deploying online RL on real robots for practitioners, providing empirical guidance to lower engineering barriers.
The study systematically tested design choices for online reinforcement learning on physical robots across 100 real-world runs, finding that some common defaults are harmful while robust alternatives enable stable learning with reduced engineering effort.
We investigate what specific design choices enable successful online reinforcement learning (RL) on physical robots. Across 100 real-world training runs on three distinct robotic platforms, we systematically ablate algorithmic, systems, and experimental decisions that are typically left implicit in prior work. We find that some widely used defaults can be harmful, while a set of robust, readily adopted design choices within standard RL practice yield stable learning across tasks and hardware. These results provide the first large-sample empirical study of such design choices, enabling practitioners to deploy online RL with lower engineering effort.