LLM Code Smells: A Taxonomy and Detection Approach
For developers integrating LLMs into software, this work documents poor coding practices and provides a detection tool to improve software quality.
The paper presents a taxonomy of nine LLM code smells and a static analysis tool, SpecDetect4LLM, for their detection. Evaluation on 692 open-source projects shows that 73.5% of systems are affected, with 91.3% precision and 71.8% recall.
Large Language Models (LLMs) are increasingly integrated into software systems for diverse purposes, due to their versatility, flexibility, and ability to simulate human reasoning to some extent. However, poor integration of LLM inference in source code can undermine software system quality. Therefore, inadequate LLM integration coding practices must be documented to help developers mitigate such issues. Following our earlier work on LLM code smells, this paper consolidates and refines the concept by presenting a self-contained taxonomy and a catalog of nine LLM code smells. We also create SpecDetect4LLM, a static source code analysis tool for their detection, and conduct extensive empirical evaluations of its detection effectiveness (precision and recall) as well as the prevalence of LLM code smells across 692 open-source software projects (171,194 source files). Our results show that LLM code smells affect 73.5% of the analyzed systems, with a detection precision of 91.3% and a recall of 71.8%.