SE AIJan 27

AlignCoder: Aligning Retrieval with Target Intent for Repository-Level Code Completion

Tianyue Jiang, Yanli Wang, Yanlin Wang, Daya Guo, Ensheng Shi, Yuchi Ma, Jiachi Chen, Zibin Zheng

arXiv:2601.19697v15.32 citationsh-index: 20ASE

Originality Incremental advance

AI Analysis

This work improves code completion for developers by enhancing retrieval accuracy in repository contexts, though it is incremental as it builds on existing RAG methods.

The paper tackles repository-level code completion by addressing misalignment in retrieval-augmented generation, proposing AlignCoder with query enhancement and reinforcement learning, resulting in an 18.1% improvement in EM score on the CrossCodeEval benchmark.

Repository-level code completion remains a challenging task for existing code large language models (code LLMs) due to their limited understanding of repository-specific context and domain knowledge. While retrieval-augmented generation (RAG) approaches have shown promise by retrieving relevant code snippets as cross-file context, they suffer from two fundamental problems: misalignment between the query and the target code in the retrieval process, and the inability of existing retrieval methods to effectively utilize the inference information. To address these challenges, we propose AlignCoder, a repository-level code completion framework that introduces a query enhancement mechanism and a reinforcement learning based retriever training method. Our approach generates multiple candidate completions to construct an enhanced query that bridges the semantic gap between the initial query and the target code. Additionally, we employ reinforcement learning to train an AlignRetriever that learns to leverage inference information in the enhanced query for more accurate retrieval. We evaluate AlignCoder on two widely-used benchmarks (CrossCodeEval and RepoEval) across five backbone code LLMs, demonstrating an 18.1% improvement in EM score compared to baselines on the CrossCodeEval benchmark. The results show that our framework achieves superior performance and exhibits high generalizability across various code LLMs and programming languages.

View on arXiv PDF

Similar