The Impact of Negative Sampling on Contrastive Structured World Models
This work addresses performance optimization in world models for reinforcement learning, but it is incremental as it focuses on tuning existing methods.
The paper investigates how variations in negative sampling strategies for contrastive loss affect the performance of contrastive structured world models, showing that leveraging time step correlations can double performance on Atari datasets.
World models trained by contrastive learning are a compelling alternative to autoencoder-based world models, which learn by reconstructing pixel states. In this paper, we describe three cases where small changes in how we sample negative states in the contrastive loss lead to drastic changes in model performance. In previously studied Atari datasets, we show that leveraging time step correlations can double the performance of the Contrastive Structured World Model. We also collect a full version of the datasets to study contrastive learning under a more diverse set of experiences.