LGAIMAMar 25, 2025

LERO: LLM-driven Evolutionary framework with Hybrid Rewards and Enhanced Observation for Multi-Agent Reinforcement Learning

arXiv:2503.21807v17 citationsh-index: 2ICIC
Originality Incremental advance
AI Analysis

This addresses bottlenecks in MARL for cooperative tasks, though it appears incremental as it builds on existing evolutionary and LLM methods.

The paper tackled credit assignment and partial observability in multi-agent reinforcement learning by proposing LERO, a framework that integrates LLMs with evolutionary optimization to generate hybrid rewards and enhanced observations, resulting in improved task performance and training efficiency in Multi-Agent Particle Environments.

Multi-agent reinforcement learning (MARL) faces two critical bottlenecks distinct from single-agent RL: credit assignment in cooperative tasks and partial observability of environmental states. We propose LERO, a framework integrating Large language models (LLMs) with evolutionary optimization to address these MARL-specific challenges. The solution centers on two LLM-generated components: a hybrid reward function that dynamically allocates individual credit through reward decomposition, and an observation enhancement function that augments partial observations with inferred environmental context. An evolutionary algorithm optimizes these components through iterative MARL training cycles, where top-performing candidates guide subsequent LLM generations. Evaluations in Multi-Agent Particle Environments (MPE) demonstrate LERO's superiority over baseline methods, with improved task performance and training efficiency.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes