LEED: A Highly Efficient and Scalable LLM-Empowered Expert Demonstrations Framework for Multi-Agent Reinforcement Learning
This addresses the problem of scaling multi-agent systems for complex decision-making, though it appears incremental as it builds on existing methods like LLMs and decentralized training.
The paper tackled the coordination and scalability bottleneck in multi-agent reinforcement learning by proposing LEED, a framework that uses large language models to generate expert demonstrations and integrates them with decentralized policy optimization, achieving superior sample efficiency, time efficiency, and robust scalability compared to state-of-the-art baselines.
Multi-agent reinforcement learning (MARL) holds substantial promise for intelligent decision-making in complex environments. However, it suffers from a coordination and scalability bottleneck as the number of agents increases. To address these issues, we propose the LLM-empowered expert demonstrations framework for multi-agent reinforcement learning (LEED). LEED consists of two components: a demonstration generation (DG) module and a policy optimization (PO) module. Specifically, the DG module leverages large language models to generate instructions for interacting with the environment, thereby producing high-quality demonstrations. The PO module adopts a decentralized training paradigm, where each agent utilizes the generated demonstrations to construct an expert policy loss, which is then integrated with its own policy loss. This enables each agent to effectively personalize and optimize its local policy based on both expert knowledge and individual experience. Experimental results show that LEED achieves superior sample efficiency, time efficiency, and robust scalability compared to state-of-the-art baselines.