Dayi Dong

h-index5
2papers

2 Papers

7.5ROMay 13
MIMIC-D: Multi-modal Imitation for MultI-agent Coordination with Decentralized Diffusion Policies

Dayi Dong, Maulik Bhatt, Seoyeon Choi et al.

As robots become more integrated in society, their ability to coordinate with other robots and humans on multi-modal tasks (those with multiple valid solutions) is crucial. Such behaviors can be learned from expert demonstrations via imitation learning (IL), but when expert demonstrations are multi-modal, standard IL approaches usually average across modes or collapse to a single mode, preventing effective coordination. Being inspired by diffusion models' ability to capture complex multi-modal trajectory distributions in single-agent settings, we develop a diffusion-based framework for coordinated multi-modal behavior in multi-agent systems. However, existing multi-agent diffusion approaches typically require a centralized planner or explicit communication among agents. This assumption can fail in real-world scenarios where robots must operate independently or with agents like humans that they cannot directly communicate with. Therefore, we propose MIMIC-D, a joint training with decentralized execution paradigm for multi-modal multi-agent IL via diffusion. We jointly train all agents' policies with only local information to achieve implicit coordination. In simulation and hardware experiments, our method exhibits robust multi-modal coordination behavior in various tasks and environments, improving upon state-of-the-art baselines.

MAFeb 24, 2025
Leveraging Large Language Models for Effective and Explainable Multi-Agent Credit Assignment

Kartik Nagpal, Dayi Dong, Jean-Baptiste Bouvier et al.

Recent work, spanning from autonomous vehicle coordination to in-space assembly, has shown the importance of learning collaborative behavior for enabling robots to achieve shared goals. A common approach for learning this cooperative behavior is to utilize the centralized-training decentralized-execution paradigm. However, this approach also introduces a new challenge: how do we evaluate the contributions of each agent's actions to the overall success or failure of the team. This credit assignment problem has remained open, and has been extensively studied in the Multi-Agent Reinforcement Learning literature. In fact, humans manually inspecting agent behavior often generate better credit evaluations than existing methods. We combine this observation with recent works which show Large Language Models demonstrate human-level performance at many pattern recognition tasks. Our key idea is to reformulate credit assignment to the two pattern recognition problems of sequence improvement and attribution, which motivates our novel LLM-MCA method. Our approach utilizes a centralized LLM reward-critic which numerically decomposes the environment reward based on the individualized contribution of each agent in the scenario. We then update the agents' policy networks based on this feedback. We also propose an extension LLM-TACA where our LLM critic performs explicit task assignment by passing an intermediary goal directly to each agent policy in the scenario. Both our methods far outperform the state-of-the-art on a variety of benchmarks, including Level-Based Foraging, Robotic Warehouse, and our new Spaceworld benchmark which incorporates collision-related safety constraints. As an artifact of our methods, we generate large trajectory datasets with each timestep annotated with per-agent reward information, as sampled from our LLM critics.