CR AISep 16, 2025

A Systematic Evaluation of Parameter-Efficient Fine-Tuning Methods for the Security of Code LLMs

Kiho Lee, Jungkon Kim, Doowon Kim, Hyoungshick Kim

arXiv:2509.12649v13.6h-index: 4

Originality Incremental advance

AI Analysis

This work addresses security risks in software development by enhancing code LLMs, offering practical guidance for building more resilient systems, though it is incremental as it evaluates existing methods.

The paper tackled the problem of insecure code generation by code LLMs by evaluating parameter-efficient fine-tuning methods, finding that prompt-tuning improved the Overall-Secure-Rate to 80.86% on CodeGen2 16B, a 13.5-point gain over baseline, and further optimization increased it to 87.65%, reducing vulnerable snippets by about 203,700 per million.

Code-generating Large Language Models (LLMs) significantly accelerate software development. However, their frequent generation of insecure code presents serious risks. We present a comprehensive evaluation of seven parameter-efficient fine-tuning (PEFT) techniques, demonstrating substantial gains in secure code generation without compromising functionality. Our research identifies prompt-tuning as the most effective PEFT method, achieving an 80.86% Overall-Secure-Rate on CodeGen2 16B, a 13.5-point improvement over the 67.28% baseline. Optimizing decoding strategies through sampling temperature further elevated security to 87.65%. This equates to a reduction of approximately 203,700 vulnerable code snippets per million generated. Moreover, prompt and prefix tuning increase robustness against poisoning attacks in our TrojanPuzzle evaluation, with strong performance against CWE-79 and CWE-502 attack vectors. Our findings generalize across Python and Java, confirming prompt-tuning's consistent effectiveness. This study provides essential insights and practical guidance for building more resilient software systems with LLMs.

View on arXiv PDF

Similar