AIJul 18, 2025

Causal Knowledge Transfer for Multi-Agent Reinforcement Learning in Dynamic Environments

Kathrin Korte, Christian Medeiros Adriano, Sona Ghahremani, Holger Giese

arXiv:2507.13846v17.83 citationsh-index: 72025 IEEE International Conference on Autonomic Computing and Self-Organizing Systems Companion (ACSOS-C)

Originality Incremental advance

AI Analysis

This work addresses the challenge of efficient adaptation for multi-agent systems in dynamic settings, though it appears incremental as it builds on existing knowledge transfer methods with a causal approach.

The paper tackles the problem of transferring knowledge across agents in non-stationary multi-agent reinforcement learning environments, where traditional methods struggle with generalization and require costly retraining. It introduces a causal knowledge transfer framework that enables agents to share compact causal representations, resulting in agents bridging about half the gap between random exploration and a fully retrained policy when adapting to new environments.

[Context] Multi-agent reinforcement learning (MARL) has achieved notable success in environments where agents must learn coordinated behaviors. However, transferring knowledge across agents remains challenging in non-stationary environments with changing goals. [Problem] Traditional knowledge transfer methods in MARL struggle to generalize, and agents often require costly retraining to adapt. [Approach] This paper introduces a causal knowledge transfer framework that enables RL agents to learn and share compact causal representations of paths within a non-stationary environment. As the environment changes (new obstacles), agents' collisions require adaptive recovery strategies. We model each collision as a causal intervention instantiated as a sequence of recovery actions (a macro) whose effect corresponds to a causal knowledge of how to circumvent the obstacle while increasing the chances of achieving the agent's goal (maximizing cumulative reward). This recovery action macro is transferred online from a second agent and is applied in a zero-shot fashion, i.e., without retraining, just by querying a lookup model with local context information (collisions). [Results] Our findings reveal two key insights: (1) agents with heterogeneous goals were able to bridge about half of the gap between random exploration and a fully retrained policy when adapting to new environments, and (2) the impact of causal knowledge transfer depends on the interplay between environment complexity and agents' heterogeneous goals.

View on arXiv PDF

Similar