SEApr 3

Precision or Peril: A PoC of Python Code Quality from Quantized Large Language Models

Eric L. Melin, Adam J. Torek, Nasir U. Eisty, Casey Kennington

arXiv:2411.1065617.91 citationsh-index: 4Has Code

Predicted impact top 19% in SE · last 90 daysOriginality Synthesis-oriented

AI Analysis

This work addresses code quality concerns for software developers using quantized LLMs for code generation, but it is incremental as it builds on existing quantization and evaluation methods.

This study investigated the code generation performance of smaller large language models (LLMs) and the effects of quantization, finding that while smaller LLMs can produce functional code, their benchmark performance is limited and quantization impacts vary, with generated code showing quality and maintainability issues.

Context: Large Language Models (LLMs) like GPT-5 and LLaMA-405b exhibit advanced code generation abilities, but their deployment demands substantial computation resources and energy. Quantization can reduce memory footprint and hardware requirements, yet may degrade code quality. Objective: This study investigates code generation performance of smaller LLMs, examines the effect of quantization, and identifies common code quality issues as a proof of concepts (PoC). Method: Four open-source LLMs are evaluated on Python benchmarks using code similarity metrics, with an analysis on 8-bit and 4-bit quantization, alongside static code quality assessment. Results: While smaller LLMs can generate functional code, benchmark performance is limited. Quantization impacts are variable, and generated code exhibits quality and maintainability concerns. Conclusions: LLM-generated code should be carefully validated before integration into software projects.

View on arXiv PDF

Similar