SE AISep 26, 2025

Reinforcement Learning-Guided Chain-of-Draft for Token-Efficient Code Generation

Xunzhu Tang, Iyiola Emmanuel Olatunji, Tiezhu Sun, Jacques Klein, Tegawende F. Bissyande

arXiv:2509.25243v11 citationsh-index: 47Has Code

Originality Incremental advance

AI Analysis

This addresses token efficiency and cost reduction for users deploying LLMs in code generation, though it is incremental as it builds on existing Chain-of-Draft methods.

The paper tackles the problem of inefficient and variable-quality code generation by LLMs using Chain-of-Draft prompting, proposing a reinforcement learning framework that selects the best candidate solution, which reduces user billing by over 50% and improves response quality across benchmarks like MBPP and SWE-bench.

LLMs demonstrate surface-level fluency in code generation but struggle with structured reasoning tasks requiring correctness and semantic alignment. While Chain-of-Thought (CoT) prompting enhances reasoning through intermediate steps, it suffers from verbosity and inefficiency. Chain-of-Draft (CoD) prompting offers more concise reasoning, but the stochastic nature of LLMs produces varying solution quality, making optimal selection challenging. We propose \multicod, a reinforcement learning framework that learns to select the most promising candidate from CoD-generated solutions. Our approach uses strategy-guided prompting to encourage diverse reasoning styles and models solution selection as a contextual bandit problem. The framework optimizes interpretable features including code complexity, reasoning structure, and strategic metadata through a reward function balancing correctness, efficiency, and clarity. Experiments on MBPP, BigCodeBench, SWE-bench Verified, and Defects4J show \multicod~outperforms and in some cases, on par with standard prompting, CoT, and CoD baselines while achieving cost and token efficiency from the user's perspective through a multi-candidate design that charges only for the selected output, reducing user billing by over 50\% and improving LLM response quality, making \multicod~more sustainable and scalable for real-world deployment. Our code is available: https://anonymous.4open.science/r/MultiCoD.

View on arXiv PDF

Similar