AINov 25, 2025

DRAFT-RL: Multi-Agent Chain-of-Draft Reasoning for Reinforcement Learning-Enhanced LLMs

arXiv:2511.20468v1
Originality Incremental advance
AI Analysis

This addresses the need for more robust and interpretable LLM agents in complex reasoning tasks, though it appears incremental as it builds on existing multi-agent RL frameworks.

The paper tackles the problem of limited structural diversity in multi-agent reinforcement learning for LLMs by proposing DRAFT-RL, which integrates Chain-of-Draft reasoning to enable multi-path exploration and peer-guided reflection, resulting in outperforming existing methods in accuracy and convergence speed on tasks like code synthesis and math.

Large Language Models (LLMs) have shown impressive capabilities in multi-step reasoning and problem-solving.Recent works introduce multi-agent reflection frameworks where multiple LLM agents critique and refine each other's outputs using reinforcement learning (RL). However, these approaches often rely on single-shot responses and lack structural diversity in reasoning exploration. In this paper, we propose DRAFT-RL, a novel framework that integrates Chain-of-Draft (CoD) reasoning into multi-agent RL training. Instead of generating single responses, each agent produces multiple drafts per query, which are then evaluated by peer agents and a learned reward model to identify the most promising trajectory. These selected drafts are used to refine future reasoning strategies through actor-critic learning.DRAFT-RL enables explicit multi-path exploration, peer-guided reflection, and reward-aligned selection, resulting in more robust and interpretable LLM agent behavior. We evaluate our method on complex reasoning tasks including code synthesis, symbolic math, and knowledge-intensive QA,demonstrating that DRAFT-RL outperforms existing reflective and RL-based agents by significant margins in both accuracy and convergence speed

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes