LG AIDec 18, 2025

CALM: A CKA-Guided Adaptive Layer-Wise Modularization Framework for LLM Quantization

Jinhao Zhang, Yunquan Zhang, Daning Chen, JunSun, Zicheng Yan

arXiv:2512.16282v29.42 citationsh-index: 4

Originality Incremental advance

AI Analysis

This work addresses quantization inefficiencies in LLMs for deployment, though it is incremental as it builds on existing PTQ methods with a novel layer-wise adaptation approach.

The paper tackles the problem of uniform quantization strategies in large language models by proposing CALM, a framework that adaptively selects optimal quantization strategies per layer using CKA, resulting in improved perplexity and downstream task performance over baselines and state-of-the-art methods.

Current mainstream post-training quantization methods for large language models typically apply a uniform quantization strategy across all network layers, overlooking the substantial differences in algorithmic suitability among layers. To address this limitation, we propose CALM (A CKA-guided Adaptive Layer-wise Modularization)a fine-tuning-free, plug-and-play framework for algorithmic heterogeneous quantization. CALM independently evaluates multiple PTQ algorithms on each layer and employs Linear Centered Kernel Alignment (CKA) as a metric to automatically select the optimal quantization strategy per layer. The individually optimized strategies are then integrated to construct a hybrid quantized model. Experiments demonstrate that our approach consistently outperforms both uniform quantization baselines and state-of-the-art mixed-precision methods across mainstream LLMsincluding LLaMA and Qwenin terms of perplexity (PPL) and downstream task performance.

View on arXiv PDF

Similar