CLSEMay 27

Beyond pass@k: Redundancy-Aware RLVR for Multi-Sample Code Generation

arXiv:2605.2802221.8h-index: 15
AI Analysis

For practitioners using LLMs for code generation, this work identifies and mitigates a previously overlooked redundancy issue in RLVR training, improving multi-sample performance.

The paper studies redundancy in LLM-generated code samples and shows that correctness-only RLVR leads to repeated implementations, while adding anti-redundancy rewards based on JPlag similarity improves finite-budget executable performance across 3 models and 3 benchmarks, often matching or outperforming specialized Pass@k-aware objectives.

LLMs for code generation are commonly evaluated in repeated-sampling settings using Pass@k, where multiple candidate programs are executed against unit tests under a finite sampling budget. While recent verifier-based reinforcement learning (RLVR) methods improve executable correctness, how these objectives affect redundancy among sampled programs remains poorly understood. In this work, we study implementation-level redundancy in code generation using JPlag, a plagiarism-detection system for code. Across models and benchmarks, we show that correctness-only RLVR often concentrates generations around repeated implementations, whereas Pass@k-aware objectives maintain lower redundancy and improve larger-budget performance. Motivated by these observations, we augment RLVR with direct anti-redundancy rewards based on JPlag similarity. Across 3 models and 3 benchmarks, discouraging near-duplicate generations reliably improves finite-budget executable performance, often matching or outperforming specialized Pass@k-aware objectives.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes