LGAISep 25, 2025

Model-Based Reinforcement Learning under Random Observation Delays

arXiv:2509.20869v12 citationsh-index: 4
Originality Incremental advance
AI Analysis

This addresses a practical issue for real-world RL applications where sensor delays are common, but it is an incremental improvement over existing methods.

The paper tackles the problem of random observation delays in reinforcement learning, which standard RL algorithms assume are instantaneous, and proposes a model-based filtering process that updates belief states to handle delays, resulting in consistent outperformance of baselines and robustness to delay distribution shifts.

Delays frequently occur in real-world environments, yet standard reinforcement learning (RL) algorithms often assume instantaneous perception of the environment. We study random sensor delays in POMDPs, where observations may arrive out-of-sequence, a setting that has not been previously addressed in RL. We analyze the structure of such delays and demonstrate that naive approaches, such as stacking past observations, are insufficient for reliable performance. To address this, we propose a model-based filtering process that sequentially updates the belief state based on an incoming stream of observations. We then introduce a simple delay-aware framework that incorporates this idea into model-based RL, enabling agents to effectively handle random delays. Applying this framework to Dreamer, we compare our approach to delay-aware baselines developed for MDPs. Our method consistently outperforms these baselines and demonstrates robustness to delay distribution shifts during deployment. Additionally, we present experiments on simulated robotic tasks, comparing our method to common practical heuristics and emphasizing the importance of explicitly modeling observation delays.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes