LG MAMar 3, 2025

Trajectory-Class-Aware Multi-Agent Reinforcement Learning

Hyungho Na, Kwanghyeon Lee, Sumin Lee, Il-Chul Moon

arXiv:2503.01440v17.13 citationsh-index: 2Has CodeICLR

Originality Incremental advance

AI Analysis

It addresses the problem of multi-task coordination for multi-agent systems, but appears incremental as it builds on existing methods with specific enhancements.

The paper tackles the challenge of generalization in multi-agent reinforcement learning across multiple tasks by introducing TRAMA, which enables agents to recognize task types through trajectory class awareness, resulting in performance improvements over state-of-the-art baselines on tasks like StarCraft II.

In the context of multi-agent reinforcement learning, generalization is a challenge to solve various tasks that may require different joint policies or coordination without relying on policies specialized for each task. We refer to this type of problem as a multi-task, and we train agents to be versatile in this multi-task setting through a single training process. To address this challenge, we introduce TRajectory-class-Aware Multi-Agent reinforcement learning (TRAMA). In TRAMA, agents recognize a task type by identifying the class of trajectories they are experiencing through partial observations, and the agents use this trajectory awareness or prediction as additional information for action policy. To this end, we introduce three primary objectives in TRAMA: (a) constructing a quantized latent space to generate trajectory embeddings that reflect key similarities among them; (b) conducting trajectory clustering using these trajectory embeddings; and (c) building a trajectory-class-aware policy. Specifically for (c), we introduce a trajectory-class predictor that performs agent-wise predictions on the trajectory class; and we design a trajectory-class representation model for each trajectory class. Each agent takes actions based on this trajectory-class representation along with its partial observation for task-aware execution. The proposed method is evaluated on various tasks, including multi-task problems built upon StarCraft II. Empirical results show further performance improvements over state-of-the-art baselines.

View on arXiv PDF Code

Similar