Analyzing Prompt Influence on Automated Method Generation: An Empirical Study with Copilot
This work addresses the problem of optimizing prompt engineering for developers using AI code generation tools, though it is incremental as it builds on existing prompt engineering research.
The study systematically investigated how eight prompt features affect the quality of code generated by Copilot for 200 Java methods, finding that features like examples and method purpose summaries significantly influence correctness, complexity, size, and similarity to developer code.
Generative AI is changing the way developers interact with software systems, providing services that can produce and deliver new content, crafted to satisfy the actual needs of developers. For instance, developers can ask for new code directly from within their IDEs by writing natural language prompts, and integrated services based on generative AI, such as Copilot, immediately respond to prompts by providing ready-to-use code snippets. Formulating the prompt appropriately, and incorporating the useful information while avoiding any information overload, can be an important factor in obtaining the right piece of code. The task of designing good prompts is known as prompt engineering. In this paper, we systematically investigate the influence of eight prompt features on the style and the content of prompts, on the level of correctness, complexity, size, and similarity to the developers' code of the generated code. We specifically consider the task of using Copilot with 124,800 prompts obtained by systematically combining the eight considered prompt features to generate the implementation of 200 Java methods. Results show how some prompt features, such as the presence of examples and the summary of the purpose of the method, can significantly influence the quality of the result.