SEAIJun 29, 2024

Beyond Functional Correctness: Investigating Coding Style Inconsistencies in Large Language Models

arXiv:2407.00456v222 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of coding style mismatches for software developers using LLMs for code generation, but it is incremental as it builds on existing research on code accuracy.

The paper investigates coding style inconsistencies between code generated by large language models (LLMs) and human developers, revealing differences in readability, conciseness, and robustness, and provides solutions to address these issues.

Large language models (LLMs) have brought a paradigm shift to the field of code generation, offering the potential to enhance the software development process. However, previous research mainly focuses on the accuracy of code generation, while coding style differences between LLMs and human developers remain under-explored. In this paper, we empirically analyze the differences in coding style between the code generated by mainstream Code LLMs and the code written by human developers, and summarize coding style inconsistency taxonomy. Specifically, we first summarize the types of coding style inconsistencies by manually analyzing a large number of generation results. We then compare the code generated by Code LLMs with the code written by human programmers in terms of readability, conciseness, and robustness. The results reveal that LLMs and developers have different coding styles. Additionally, we study the possible causes of these inconsistencies and provide some solutions to alleviate the problem.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes