SE AI CL PLMar 1, 2024

Comparing large language models and human programmers for generating programming code

arXiv:2403.00894v246 citationsh-index: 14Adv Sci

Originality Incremental advance

AI Analysis

This work addresses the problem of automating code generation for software developers, showing incremental improvements in model performance.

The study evaluated seven large language models for generating programming code, finding that GPT-4 outperformed others and, with optimal prompting, beat 85% of human participants in coding contests.

We systematically evaluated the performance of seven large language models in generating programming code using various prompt strategies, programming languages, and task difficulties. GPT-4 substantially outperforms other large language models, including Gemini Ultra and Claude 2. The coding performance of GPT-4 varies considerably with different prompt strategies. In most LeetCode and GeeksforGeeks coding contests evaluated in this study, GPT-4 employing the optimal prompt strategy outperforms 85 percent of human participants. Additionally, GPT-4 demonstrates strong capabilities in translating code between different programming languages and in learning from past errors. The computational efficiency of the code generated by GPT-4 is comparable to that of human programmers. These results suggest that GPT-4 has the potential to serve as a reliable assistant in programming code generation and software development.

View on arXiv PDF

Similar