Reinforcement Learning-Augmented LLM Agents for Collaborative Decision Making and Performance Optimization
This addresses the challenge of reliable collaboration in complex workflows for users of multi-agent LLM systems, though it is incremental as it builds on existing Dec-POMDP and CTDE methods.
The paper tackles the problem of LLMs lacking collaborative awareness in multi-agent settings by introducing a reinforcement learning-augmented framework, resulting in a 3x increase in task processing speed, 98.7% consistency in writing, and a 74.6% test pass rate in coding.
Large Language Models (LLMs) perform well in language tasks but often lack collaborative awareness and struggle to optimize global performance in multi-agent settings. We present a reinforcement learning-augmented LLM agent framework that formulates cooperation as a decentralized partially observable Markov decision process (Dec-POMDP) and adopts centralized training with decentralized execution (CTDE). We introduce Group Relative Policy Optimization (GRPO) to jointly optimize agent policies with access to global signals during training, together with a simplified joint reward that balances task quality, speed, and coordination cost. On collaborative writing and coding benchmarks, our framework delivers a 3x increase in task processing speed over single-agent baselines, 98.7% structural/style consistency in writing, and a 74.6% test pass rate in coding. The approach consistently outperforms strong multi-agent LLM baselines and provides a practical path toward reliable collaboration in complex workflows.