CVFeb 15, 2025

CalibQuant: 1-Bit KV Cache Quantization for Multimodal LLMs

arXiv:2502.14882v23 citationsh-index: 12Has Code
Originality Incremental advance
AI Analysis

This addresses deployment bottlenecks for multimodal LLMs on memory-constrained GPUs, offering a plug-and-play solution with incremental improvements in efficiency.

The paper tackles the computational overhead and memory footprint of KV caches in multimodal LLMs by proposing CalibQuant, a 1-bit quantization strategy with post-scaling and calibration, achieving a 10x throughput increase on InternVL models.

Multimodal Large Language Models (MLLMs) have demonstrated remarkable performance across diverse applications. However, their computational overhead during deployment remains a critical bottleneck. While Key-Value (KV) caching effectively trades memory for computation to enhance inference efficiency, the growing memory footprint from extensive KV caches significantly reduces throughput and restricts prolonged deployment on memory-constrained GPU devices. To address this challenge, we propose CalibQuant, a simple yet highly effective visual quantization strategy that drastically reduces both memory and computational overhead. Specifically, CalibQuant introduces an extreme 1-bit quantization scheme, complemented by novel post-scaling and calibration techniques tailored to the intrinsic patterns of KV caches, thereby ensuring high efficiency without compromising model performance. Leveraging Triton for runtime optimization, we achieve a 10x throughput increase on InternVL models. Our method is designed to be plug-and-play, seamlessly integrating with various existing MLLMs without requiring architectural changes. Extensive experiments confirm that our approach significantly reduces memory usage while maintaining computational efficiency and preserving multimodal capabilities. Codes are available at https://github.com/insuhan/calibquant.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes