SE LGJul 17, 2023

FRANC: A Lightweight Framework for High-Quality Code Generation

Mohammed Latif Siddiq, Beatrice Casey, Joanna C. S. Santos

arXiv:2307.08220v212.528 citationsh-index: 12

Originality Incremental advance

AI Analysis

This addresses code quality and security issues for developers using automated code generation, though it is incremental as it builds on existing models with filtering and ranking techniques.

The paper tackles the problem of vulnerabilities and quality issues in code generated by transformer-based models by introducing FRANC, a lightweight framework that improves compilability by 9-46% for Java and 10-43% for Python, enhances ranking with an average NDCG@10 gain of 0.0763, and repairs up to 80% of prompts.

In recent years, the use of automated source code generation utilizing transformer-based generative models has expanded, and these models can generate functional code according to the requirements of the developers. However, recent research revealed that these automatically generated source codes can contain vulnerabilities and other quality issues. Despite researchers' and practitioners' attempts to enhance code generation models, retraining and fine-tuning large language models is time-consuming and resource-intensive. Thus, we describe FRANC, a lightweight framework for recommending more secure and high-quality source code derived from transformer-based code generation models. FRANC includes a static filter to make the generated code compilable with heuristics and a quality-aware ranker to sort the code snippets based on a quality score. Moreover, the framework uses prompt engineering to fix persistent quality issues. We evaluated the framework with five Python and Java code generation models and six prompt datasets, including a newly created one in this work (SOEval). The static filter improves 9% to 46% Java suggestions and 10% to 43% Python suggestions regarding compilability. The average improvement over the NDCG@10 score for the ranking system is 0.0763, and the repairing techniques repair the highest 80% of prompts. FRANC takes, on average, 1.98 seconds for Java; for Python, it takes 0.08 seconds.

View on arXiv PDF

Similar