LG AIAug 8, 2025

In-Context Reinforcement Learning via Communicative World Models

Fernando Martinez-Lopez, Tao Li, Yingdong Lu, Juntao Chen

arXiv:2508.06659v17.11 citationsh-index: 1

Originality Highly original

AI Analysis

This addresses the challenge of in-context generalization for RL agents, which is incremental as it builds on existing methods for representation learning and communication.

The paper tackles the problem of reinforcement learning agents struggling to generalize to new tasks without parameter updates by introducing CORAL, a framework that uses emergent communication to learn transferable representations, resulting in significant gains in sample efficiency and successful zero-shot adaptation in unseen sparse-reward environments.

Reinforcement learning (RL) agents often struggle to generalize to new tasks and contexts without updating their parameters, mainly because their learned representations and policies are overfit to the specifics of their training environments. To boost agents' in-context RL (ICRL) ability, this work formulates ICRL as a two-agent emergent communication problem and introduces CORAL (Communicative Representation for Adaptive RL), a framework that learns a transferable communicative context by decoupling latent representation learning from control. In CORAL, an Information Agent (IA) is pre-trained as a world model on a diverse distribution of tasks. Its objective is not to maximize task reward, but to build a world model and distill its understanding into concise messages. The emergent communication protocol is shaped by a novel Causal Influence Loss, which measures the effect that the message has on the next action. During deployment, the previously trained IA serves as a fixed contextualizer for a new Control Agent (CA), which learns to solve tasks by interpreting the provided communicative context. Our experiments demonstrate that this approach enables the CA to achieve significant gains in sample efficiency and successfully perform zero-shot adaptation with the help of pre-trained IA in entirely unseen sparse-reward environments, validating the efficacy of learning a transferable communicative representation.

View on arXiv PDF

Similar