LG AIJun 18, 2025

Zero-Shot Reinforcement Learning Under Partial Observability

Scott Jeen, Tom Bewley, Jonathan M. Cullen

arXiv:2506.15446v113.07 citationsh-index: 5Has Code

Originality Incremental advance

AI Analysis

This work addresses a limitation in zero-shot RL for real-world applications where full observability is often unavailable, though it is incremental as it extends existing memory-based approaches to the zero-shot setting.

The paper tackles the problem of zero-shot reinforcement learning under partial observability, showing that standard methods degrade in such settings and that memory-based architectures improve performance over memory-free baselines in domains with partially observed states, rewards, and dynamics.

Recent work has shown that, under certain assumptions, zero-shot reinforcement learning (RL) methods can generalise to any unseen task in an environment after reward-free pre-training. Access to Markov states is one such assumption, yet, in many real-world applications, the Markov state is only partially observable. Here, we explore how the performance of standard zero-shot RL methods degrades when subjected to partially observability, and show that, as in single-task RL, memory-based architectures are an effective remedy. We evaluate our memory-based zero-shot RL methods in domains where the states, rewards and a change in dynamics are partially observed, and show improved performance over memory-free baselines. Our code is open-sourced via: https://enjeeneer.io/projects/bfms-with-memory/.

View on arXiv PDF

Similar