Generalizing Consistency Policy to Visual RL with Prioritized Proximal Experience Regularization
This work addresses the problem of improving sample efficiency and stability in visual RL for robotics and control applications, representing a novel extension of consistency models to this domain.
The paper tackled the challenge of low sample efficiency and training instability in visual reinforcement learning (RL) with high-dimensional state spaces by proposing a consistency policy with prioritized proximal experience regularization (CP3ER), which achieved state-of-the-art performance in 21 tasks across DeepMind control suite and Meta-world.
With high-dimensional state spaces, visual reinforcement learning (RL) faces significant challenges in exploitation and exploration, resulting in low sample efficiency and training stability. As a time-efficient diffusion model, although consistency models have been validated in online state-based RL, it is still an open question whether it can be extended to visual RL. In this paper, we investigate the impact of non-stationary distribution and the actor-critic framework on consistency policy in online RL, and find that consistency policy was unstable during the training, especially in visual RL with the high-dimensional state space. To this end, we suggest sample-based entropy regularization to stabilize the policy training, and propose a consistency policy with prioritized proximal experience regularization (CP3ER) to improve sample efficiency. CP3ER achieves new state-of-the-art (SOTA) performance in 21 tasks across DeepMind control suite and Meta-world. To our knowledge, CP3ER is the first method to apply diffusion/consistency models to visual RL and demonstrates the potential of consistency models in visual RL. More visualization results are available at https://jzndd.github.io/CP3ER-Page/.