SE CLDec 15, 2023

A Review of Repository Level Prompting for LLMs

arXiv:2312.10101v11.71 citationsh-index: 2

Originality Synthesis-oriented

AI Analysis

It addresses the problem of scaling LLM-based code generation from individual problems to entire repositories for software developers, though it is primarily a review of existing methods.

This paper reviews approaches for prompting large language models to generate code at the repository level, comparing techniques like Repository-Level Prompt Generation and RepoCoder to establish best practices for improving LLM performance in code generation tasks.

As coding challenges become more complex, recent advancements in Large Language Models (LLMs) have led to notable successes, such as achieving a 94.6\% solve rate on the HumanEval benchmark. Concurrently, there is an increasing commercial push for repository-level inline code completion tools, such as GitHub Copilot and Tab Nine, aimed at enhancing developer productivity. This paper delves into the transition from individual coding problems to repository-scale solutions, presenting a thorough review of the current literature on effective LLM prompting for code generation at the repository level. We examine approaches that will work with black-box LLMs such that they will be useful and applicable to commercial use cases, and their applicability in interpreting code at a repository scale. We juxtapose the Repository-Level Prompt Generation technique with RepoCoder, an iterative retrieval and generation method, to highlight the trade-offs inherent in each approach and to establish best practices for their application in cutting-edge coding benchmarks. The interplay between iterative refinement of prompts and the development of advanced retrieval systems forms the core of our discussion, offering a pathway to significantly improve LLM performance in code generation tasks. Insights from this study not only guide the application of these methods but also chart a course for future research to integrate such techniques into broader software engineering contexts.

View on arXiv PDF

Similar