On Memory: A comparison of memory mechanisms in world models
This work addresses a bottleneck in world models for AI agents, but it is incremental as it compares existing memory mechanisms rather than introducing a new paradigm.
The paper tackled the problem of limited memory span in transformer-based world models, which causes perceptual drift and hinders long-horizon planning; it found that memory mechanisms improve effective memory span, enabling loop closures in imagined trajectories.
World models enable agents to plan within imagined environments by predicting future states conditioned on past observations and actions. However, their ability to plan over long horizons is limited by the effective memory span of the backbone architecture. This limitation leads to perceptual drift in long rollouts, hindering the model's capacity to perform loop closures within imagined trajectories. In this work, we investigate the effective memory span of transformer-based world models through an analysis of several memory augmentation mechanisms. We introduce a taxonomy that distinguishes between memory encoding and memory injection mechanisms, motivating their roles in extending the world model's memory through the lens of residual stream dynamics. Using a state recall evaluation task, we measure the memory recall of each mechanism and analyze its respective trade-offs. Our findings show that memory mechanisms improve the effective memory span in vision transformers and provide a path to completing loop closures within a world model's imagination.