CVMar 13, 2025

Enhancing Multi-Agent Systems via Reinforcement Learning with LLM-based Planner and Graph-based Policy

Ziqi Jia, Junjie Li, Xiaoyang Qu, Jianzong Wang

arXiv:2503.10049v118.212 citationsh-index: 11ICRA

Originality Incremental advance

AI Analysis

This work addresses coordination problems in multi-agent systems for applications like robotics or simulations, but it appears incremental as it builds on existing LLM and MARL methods.

The paper tackles coordination and safety challenges in multi-agent systems by proposing LGC-MARL, a framework that combines LLM-based planning and graph-based policies, achieving superior performance and scalability on the AI2-THOR simulation platform.

Multi-agent systems (MAS) have shown great potential in executing complex tasks, but coordination and safety remain significant challenges. Multi-Agent Reinforcement Learning (MARL) offers a promising framework for agent collaboration, but it faces difficulties in handling complex tasks and designing reward functions. The introduction of Large Language Models (LLMs) has brought stronger reasoning and cognitive abilities to MAS, but existing LLM-based systems struggle to respond quickly and accurately in dynamic environments. To address these challenges, we propose LLM-based Graph Collaboration MARL (LGC-MARL), a framework that efficiently combines LLMs and MARL. This framework decomposes complex tasks into executable subtasks and achieves efficient collaboration among multiple agents through graph-based coordination. Specifically, LGC-MARL consists of two main components: an LLM planner and a graph-based collaboration meta policy. The LLM planner transforms complex task instructions into a series of executable subtasks, evaluates the rationality of these subtasks using a critic model, and generates an action dependency graph. The graph-based collaboration meta policy facilitates communication and collaboration among agents based on the action dependency graph, and adapts to new task environments through meta-learning. Experimental results on the AI2-THOR simulation platform demonstrate the superior performance and scalability of LGC-MARL in completing various complex tasks.

View on arXiv PDF

Similar