SE AIAug 1, 2025

A Note on Code Quality Score: LLMs for Maintainable Large Codebases

Sherman Wong, Jalaj Bhandari, Leo Zhou Fan Yang, Xylan Xu, Yi Zhuang, Cem Cayiroglu, Payal Bhuptani, Sheela Yadawad, Hung Duong

arXiv:2508.02732v1h-index: 8

Originality Incremental advance

AI Analysis

This addresses code maintenance problems for developers in large-scale industrial environments, but it is incremental as it builds on existing LLM fine-tuning techniques.

The paper tackles the challenge of maintaining code quality in large-scale software systems by introducing the Code Quality Score (CQS) system, which uses fine-tuned Llama3 models to detect code quality issues and provide critiques, achieving a 60% week-over-week user helpfulness rate in an industrial setting.

Maintaining code quality in large-scale software systems presents significant challenges, particularly in settings where a large numbers of engineers work concurrently on a codebase. This paper introduces Code Quality Score (CQS) system to automatically detect issues with a set of code changes and provide actionable insights. At its core, the CQS system is powered by two Llama3 models, fine-tuned (with SFT and offline RL approaches), to a) detect common code quality issues related to coding best practices and b) to provide good ``critiques'' for LLM-generated code review respectively. To maintain good user experience, we layer the system with hand-crafted rules to filter out incorrect responses/hallucinations. Offline evaluations show that our CQS system is able to achieve an impressive precision rate for identifying valid issues. This system has already been rolled out to developers in an industrial scale setting and has consistently achieved 60\% week over week user helpfulness rate, demonstrating its effectiveness in a real-world environment. In this paper, we present details of the CQS system along with some learnings on curating developer feedback to create training data for LLM fine-tuning.

View on arXiv PDF

Similar