Next Embedding Prediction Makes World Models Stronger
This work addresses the problem of model-based reinforcement learning for researchers and practitioners working in complex, partially observable environments, and presents an incremental yet effective improvement.
The authors tackled the problem of model-based reinforcement learning in partially observable, high-dimensional domains and achieved substantial gains with their approach, matching or exceeding the performance of leading agents. On a subset of DMLab tasks, their method achieved substantial gains.
Capturing temporal dependencies is critical for model-based reinforcement learning (MBRL) in partially observable, high-dimensional domains. We introduce NE-Dreamer, a decoder-free MBRL agent that leverages a temporal transformer to predict next-step encoder embeddings from latent state sequences, directly optimizing temporal predictive alignment in representation space. This approach enables NE-Dreamer to learn coherent, predictive state representations without reconstruction losses or auxiliary supervision. On the DeepMind Control Suite, NE-Dreamer matches or exceeds the performance of DreamerV3 and leading decoder-free agents. On a challenging subset of DMLab tasks involving memory and spatial reasoning, NE-Dreamer achieves substantial gains. These results establish next-embedding prediction with temporal transformers as an effective, scalable framework for MBRL in complex, partially observable environments.