Prompt engineering and framework: implementation to increase code reliability based guideline for LLMs
This work addresses the challenge of generating reliable code for AI-driven programming tasks, offering an incremental improvement in efficiency and performance.
The paper tackles the problem of improving the accuracy and reliability of Python code generated by Large Language Models (LLMs) by introducing a novel prompt template, which outperforms zero-shot and Chain-of-Thought methods on the HumanEval dataset with reduced token usage.
In this paper, we propose a novel prompting approach aimed at enhancing the ability of Large Language Models (LLMs) to generate accurate Python code. Specifically, we introduce a prompt template designed to improve the quality and correctness of generated code snippets, enabling them to pass tests and produce reliable results. Through experiments conducted on two state-of-the-art LLMs using the HumanEval dataset, we demonstrate that our approach outperforms widely studied zero-shot and Chain-of-Thought (CoT) methods in terms of the Pass@k metric. Furthermore, our method achieves these improvements with significantly reduced token usage compared to the CoT approach, making it both effective and resource-efficient, thereby lowering the computational demands and improving the eco-footprint of LLM capabilities. These findings highlight the potential of tailored prompting strategies to optimize code generation performance, paving the way for broader applications in AI-driven programming tasks.