Reset-free Reinforcement Learning with World Models
This work reduces human supervision in RL training, which is a practical problem for AI researchers and developers, though it builds incrementally on existing model-based approaches.
The paper tackled the problem of reducing human effort in reinforcement learning by addressing the reset-free setting, where agents must learn without manual resets. The proposed model-based reset-free (MoReFree) agent outperformed prior state-of-the-art methods and privileged baselines, showing superior data-efficiency without access to environmental rewards or demonstrations.
Reinforcement learning (RL) is an appealing paradigm for training intelligent agents, enabling policy acquisition from the agent's own autonomously acquired experience. However, the training process of RL is far from automatic, requiring extensive human effort to reset the agent and environments. To tackle the challenging reset-free setting, we first demonstrate the superiority of model-based (MB) RL methods in such setting, showing that a straightforward adaptation of MBRL can outperform all the prior state-of-the-art methods while requiring less supervision. We then identify limitations inherent to this direct extension and propose a solution called model-based reset-free (MoReFree) agent, which further enhances the performance. MoReFree adapts two key mechanisms, exploration and policy learning, to handle reset-free tasks by prioritizing task-relevant states. It exhibits superior data-efficiency across various reset-free tasks without access to environmental reward or demonstrations while significantly outperforming privileged baselines that require supervision. Our findings suggest model-based methods hold significant promise for reducing human effort in RL. Website: https://yangzhao-666.github.io/morefree