Relative Positioning Based Code Chunking Method For Rich Context Retrieval In Repository Level Code Completion Task With Code Language Model
This addresses the challenge of enhancing code completion for developers using large language models, but it appears incremental as it builds on existing retrieval methods.
The paper tackles the problem of determining effective context for code completion by proposing a repository-level preprocessing strategy that uses code chunking and relative positioning to improve retrieval. The result is improved performance in code completion tasks, though no concrete numbers are provided.
Code completion can help developers improve efficiency and ease the development lifecycle. Although code completion is available in modern integrated development environments (IDEs), research lacks in determining what makes a good context for code completion based on the information available to the IDEs for the large language models (LLMs) to perform better. In this paper, we describe an effective context collection strategy to assist the LLMs in performing better at code completion tasks. The key idea of our strategy is to preprocess the repository into smaller code chunks and later use syntactic and semantic similarity-based code chunk retrieval with relative positioning. We found that code chunking and relative positioning of the chunks in the final context improve the performance of code completion tasks.