LLM4TDD: Best Practices for Test Driven Development Using Large Language Models
This addresses software reliability issues for developers and researchers, but it is incremental as it applies existing methods to a new context.
The paper tackles the problem of improving software correctness by exploring LLM4TDD, a method that guides Large Language Models to generate code iteratively using test-driven development, and finds that different test, prompt, and problem attributes impact its efficacy in an empirical evaluation with ChatGPT and LeetCode problems.
In today's society, we are becoming increasingly dependent on software systems. However, we also constantly witness the negative impacts of buggy software. Program synthesis aims to improve software correctness by automatically generating the program given an outline of the expected behavior. For decades, program synthesis has been an active research field, with recent approaches looking to incorporate Large Language Models to help generate code. This paper explores the concept of LLM4TDD, where we guide Large Language Models to generate code iteratively using a test-driven development methodology. We conduct an empirical evaluation using ChatGPT and coding problems from LeetCode to investigate the impact of different test, prompt and problem attributes on the efficacy of LLM4TDD.