Probabilistic Dreaming for World Models
This work offers an incremental improvement for researchers working on sample-efficient learning in world models.
This paper introduces probabilistic innovations to the Dreamer world model, enabling parallel exploration of latent states and distinct hypotheses for mutually exclusive futures. The method achieved a 4.5% score improvement and 28% lower variance in episode returns compared to standard Dreamer on the MPE SimpleTag domain.
"Dreaming" enables agents to learn from imagined experiences, enabling more robust and sample-efficient learning of world models. In this work, we consider innovations to the state-of-the-art Dreamer model using probabilistic methods that enable: (1) the parallel exploration of many latent states; and (2) maintaining distinct hypotheses for mutually exclusive futures while retaining the desirable gradient properties of continuous latents. Evaluating on the MPE SimpleTag domain, our method outperforms standard Dreamer with a 4.5% score improvement and 28% lower variance in episode returns. We also discuss limitations and directions for future work, including how optimal hyperparameters (e.g. particle count K) scale with environmental complexity, and methods to capture epistemic uncertainty in world models.