RepoHyper: Search-Expand-Refine on Semantic Graphs for Repository-Level Code Completion
This addresses the challenge of improving code completion accuracy for developers by better leveraging repository context, though it appears incremental as it builds on existing retrieval and graph-based methods.
The authors tackled the problem of code large language models lacking full understanding of project repository context, which leads to less precise completions, by introducing RepoHyper, a framework that uses a semantic graph and retrieval methods to improve repository-level code completion, resulting in markedly outperforming existing techniques with enhanced accuracy across various datasets.
Code Large Language Models (CodeLLMs) have demonstrated impressive proficiency in code completion tasks. However, they often fall short of fully understanding the extensive context of a project repository, such as the intricacies of relevant files and class hierarchies, which can result in less precise completions. To overcome these limitations, we present \tool, a multifaceted framework designed to address the complex challenges associated with repository-level code completion. Central to RepoHYPER is the {\em Repo-level Semantic Graph} (RSG), a novel semantic graph structure that encapsulates the vast context of code repositories. Furthermore, RepoHyper leverages Expand and Refine retrieval method, including a graph expansion and a link prediction algorithm applied to the RSG, enabling the effective retrieval and prioritization of relevant code snippets. Our evaluations show that \tool markedly outperforms existing techniques in repository-level code completion, showcasing enhanced accuracy across various datasets when compared to several strong baselines. Our implementation of RepoHYPER can be found at https://github.com/FSoft-AI4Code/RepoHyper.