LGFeb 1, 2025

EDELINE: Enhancing Memory in Diffusion-based World Models via Linear-Time Sequence Modeling

arXiv:2502.00466v23 citationsh-index: 5
Originality Incremental advance
AI Analysis

This work addresses memory limitations in world models for reinforcement learning agents, offering an incremental improvement over existing methods.

The paper tackles the problem of limited memory capacity in diffusion-based world models for reinforcement learning by introducing EDELINE, which integrates state space models with diffusion models, resulting in superior performance across Atari 100k, Crafter, and ViZDoom benchmarks.

World models represent a promising approach for training reinforcement learning agents with significantly improved sample efficiency. While most world model methods primarily rely on sequences of discrete latent variables to model environment dynamics, this compression often neglects critical visual details essential for reinforcement learning. Recent diffusion-based world models condition generation on a fixed context length of frames to predict the next observation, using separate recurrent neural networks to model rewards and termination signals. Although this architecture effectively enhances visual fidelity, the fixed context length approach inherently limits memory capacity. In this paper, we introduce EDELINE, a unified world model architecture that integrates state space models with diffusion models. Our approach outperforms existing baselines across visually challenging Atari 100k tasks, memory-demanding Crafter benchmark, and 3D first-person ViZDoom environments, demonstrating superior performance in all these diverse challenges.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes