DreamerV3-XP: Optimizing exploration through uncertainty estimation
This is an incremental improvement for reinforcement learning practitioners working on sample-efficient algorithms.
The paper tackled the problem of improving exploration and learning efficiency in reinforcement learning by extending DreamerV3 with a prioritized replay buffer and intrinsic reward based on ensemble disagreement, resulting in faster learning and lower dynamics model loss, especially in sparse-reward settings.
We introduce DreamerV3-XP, an extension of DreamerV3 that improves exploration and learning efficiency. This includes (i) a prioritized replay buffer, scoring trajectories by return, reconstruction loss, and value error and (ii) an intrinsic reward based on disagreement over predicted environment rewards from an ensemble of world models. DreamerV3-XP is evaluated on a subset of Atari100k and DeepMind Control Visual Benchmark tasks, confirming the original DreamerV3 results and showing that our extensions lead to faster learning and lower dynamics model loss, particularly in sparse-reward settings.