SE AI PLJun 28, 2025

Smaller = Weaker? Benchmarking Robustness of Quantized LLMs in Code Generation

Sen Fang, Weiyuan Ding, Antonio Mastropaolo, Bowen Xu

arXiv:2506.22776v111.35 citationsh-index: 14

Originality Incremental advance

AI Analysis

This addresses the problem of ensuring reliable and efficient deployment of LLMs for developers and researchers, providing insights into robustness trade-offs, though it is incremental as it builds on existing quantization methods.

The paper investigates how quantization affects the robustness of Large Language Models in code generation, finding that quantized models often show superior resilience to adversarial attacks and noise perturbations compared to full-precision counterparts, with 51.59% versus 42.86% of adversarial experiments favoring quantized models.

Quantization has emerged as a mainstream method for compressing Large Language Models (LLMs), reducing memory requirements and accelerating inference without architectural modifications. While existing research primarily focuses on evaluating the effectiveness of quantized LLMs compared to their original counterparts, the impact on robustness remains largely unexplored.In this paper, we present the first systematic investigation of how quantization affects the robustness of LLMs in code generation tasks. Through extensive experiments across four prominent LLM families (LLaMA, DeepSeek, CodeGen, and StarCoder) with parameter scales ranging from 350M to 33B, we evaluate robustness from dual perspectives: adversarial attacks on input prompts and noise perturbations on model architecture. Our findings challenge conventional wisdom by demonstrating that quantized LLMs often exhibit superior robustness compared to their full-precision counterparts, with 51.59% versus 42.86% of our adversarial experiments showing better resilience in quantized LLMs. Similarly, our noise perturbation experiments also confirm that LLMs after quantitation generally withstand higher levels of weight disturbances. These results suggest that quantization not only reduces computational requirements but can actually enhance LLMs' reliability in code generation tasks, providing valuable insights for developing more robust and efficient LLM deployment strategies.

View on arXiv PDF

Similar