Causal Knowledge Transfer for Multi-Agent Reinforcement Learning in Dynamic Environments
This work addresses the challenge of efficient adaptation for multi-agent systems in dynamic settings, though it appears incremental as it builds on existing knowledge transfer methods with a causal approach.
The paper tackles the problem of transferring knowledge across agents in non-stationary multi-agent reinforcement learning environments, where traditional methods struggle with generalization and require costly retraining. It introduces a causal knowledge transfer framework that enables agents to share compact causal representations, resulting in agents bridging about half the gap between random exploration and a fully retrained policy when adapting to new environments.
[Context] Multi-agent reinforcement learning (MARL) has achieved notable success in environments where agents must learn coordinated behaviors. However, transferring knowledge across agents remains challenging in non-stationary environments with changing goals. [Problem] Traditional knowledge transfer methods in MARL struggle to generalize, and agents often require costly retraining to adapt. [Approach] This paper introduces a causal knowledge transfer framework that enables RL agents to learn and share compact causal representations of paths within a non-stationary environment. As the environment changes (new obstacles), agents' collisions require adaptive recovery strategies. We model each collision as a causal intervention instantiated as a sequence of recovery actions (a macro) whose effect corresponds to a causal knowledge of how to circumvent the obstacle while increasing the chances of achieving the agent's goal (maximizing cumulative reward). This recovery action macro is transferred online from a second agent and is applied in a zero-shot fashion, i.e., without retraining, just by querying a lookup model with local context information (collisions). [Results] Our findings reveal two key insights: (1) agents with heterogeneous goals were able to bridge about half of the gap between random exploration and a fully retrained policy when adapting to new environments, and (2) the impact of causal knowledge transfer depends on the interplay between environment complexity and agents' heterogeneous goals.