EfficientZero V2: Mastering Discrete and Continuous Control with Limited Data
This work addresses the problem of applying reinforcement learning to real-world tasks with limited data, representing a strong incremental improvement over existing methods.
The paper tackles the challenge of sample efficiency in reinforcement learning by introducing EfficientZero V2, a general framework that outperforms the state-of-the-art DreamerV3 in 50 out of 66 tasks across diverse benchmarks under limited data settings.
Sample efficiency remains a crucial challenge in applying Reinforcement Learning (RL) to real-world tasks. While recent algorithms have made significant strides in improving sample efficiency, none have achieved consistently superior performance across diverse domains. In this paper, we introduce EfficientZero V2, a general framework designed for sample-efficient RL algorithms. We have expanded the performance of EfficientZero to multiple domains, encompassing both continuous and discrete actions, as well as visual and low-dimensional inputs. With a series of improvements we propose, EfficientZero V2 outperforms the current state-of-the-art (SOTA) by a significant margin in diverse tasks under the limited data setting. EfficientZero V2 exhibits a notable advancement over the prevailing general algorithm, DreamerV3, achieving superior outcomes in 50 of 66 evaluated tasks across diverse benchmarks, such as Atari 100k, Proprio Control, and Vision Control.