CVMar 5

MASQuant: Modality-Aware Smoothing Quantization for Multimodal Large Language Models

arXiv:2603.04800v1Has Code
Originality Incremental advance
AI Analysis

This work provides an incremental improvement in post-training quantization for multimodal large language models, aiming to reduce computational costs for researchers and practitioners working with these models.

This paper addresses challenges in applying post-training quantization (PTQ) to multimodal large language models (MLLMs) by proposing MASQuant. It introduces Modality-Aware Smoothing (MAS) to learn separate smoothing factors for different modalities and Cross-Modal Compensation (CMC) using SVD whitening to enable unified quantization across modalities. MASQuant demonstrates stable and competitive quantization performance across dual-modal and tri-modal MLLMs.

Post-training quantization (PTQ) with computational invariance for Large Language Models~(LLMs) have demonstrated remarkable advances, however, their application to Multimodal Large Language Models~(MLLMs) presents substantial challenges. In this paper, we analyze SmoothQuant as a case study and identify two critical issues: Smoothing Misalignment and Cross-Modal Computational Invariance. To address these issues, we propose Modality-Aware Smoothing Quantization (MASQuant), a novel framework that introduces (1) Modality-Aware Smoothing (MAS), which learns separate, modality-specific smoothing factors to prevent Smoothing Misalignment, and (2) Cross-Modal Compensation (CMC), which addresses Cross-modal Computational Invariance by using SVD whitening to transform multi-modal activation differences into low-rank forms, enabling unified quantization across modalities. MASQuant demonstrates stable quantization performance across both dual-modal and tri-modal MLLMs. Experimental results show that MASQuant is competitive among the state-of-the-art PTQ algorithms. Source code: https://github.com/alibaba/EfficientAI.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes