CR AIFeb 18, 2025

Investigating the Impact of Quantization Methods on the Safety and Reliability of Large Language Models

Artyom Kharinaev, Viktor Moskvoretskii, Egor Shvetsov, Kseniia Studenikina, Bykov Mikhail, Evgeny Burnaev

arXiv:2502.15799v217.614 citationsh-index: 6Has Code

Originality Synthesis-oriented

AI Analysis

This addresses the safety concerns for users of compressed LLMs, highlighting critical trade-offs in efficiency and trustworthiness, but it is incremental as it builds on existing quantization methods with new evaluations.

The study tackled the problem of how quantization methods affect the safety and reliability of large language models, finding that both post-training and quantization-aware training techniques can degrade safety alignment, with no single method consistently outperforming others across benchmarks, precision settings, or models.

Large Language Models (LLMs) are powerful tools for modern applications, but their computational demands limit accessibility. Quantization offers efficiency gains, yet its impact on safety and trustworthiness remains poorly understood. To address this, we introduce OpenMiniSafety, a human-curated safety dataset with 1.067 challenging questions to rigorously evaluate model behavior. We publicly release human safety evaluations for four LLMs (both quantized and full-precision), totaling 4.268 annotated question-answer pairs. By assessing 66 quantized variants of these models using four post-training quantization (PTQ) and two quantization-aware training (QAT) methods across four safety benchmarks including human-centric evaluations we uncover critical safety performance trade-offs. Our results show both PTQ and QAT can degrade safety alignment, with QAT techniques like QLORA or STE performing less safely. No single method consistently outperforms others across benchmarks, precision settings, or models, highlighting the need for safety-aware compression strategies. Furthermore, precision-specialized methods (e.g., QUIK and AWQ for 4-bit, AQLM and Q-PET for 2-bit) excel at their target precision, meaning that these methods are not better at compressing but rather different approaches.

View on arXiv PDF Code

Similar