LGAIDec 18, 2025

CALM: A CKA-Guided Adaptive Layer-Wise Modularization Framework for LLM Quantization

arXiv:2512.16282v22 citationsh-index: 4
Originality Incremental advance
AI Analysis

This work addresses quantization inefficiencies in LLMs for deployment, though it is incremental as it builds on existing PTQ methods with a novel layer-wise adaptation approach.

The paper tackles the problem of uniform quantization strategies in large language models by proposing CALM, a framework that adaptively selects optimal quantization strategies per layer using CKA, resulting in improved perplexity and downstream task performance over baselines and state-of-the-art methods.

Current mainstream post-training quantization methods for large language models typically apply a uniform quantization strategy across all network layers, overlooking the substantial differences in algorithmic suitability among layers. To address this limitation, we propose CALM (A CKA-guided Adaptive Layer-wise Modularization)a fine-tuning-free, plug-and-play framework for algorithmic heterogeneous quantization. CALM independently evaluates multiple PTQ algorithms on each layer and employs Linear Centered Kernel Alignment (CKA) as a metric to automatically select the optimal quantization strategy per layer. The individually optimized strategies are then integrated to construct a hybrid quantized model. Experiments demonstrate that our approach consistently outperforms both uniform quantization baselines and state-of-the-art mixed-precision methods across mainstream LLMsincluding LLaMA and Qwenin terms of perplexity (PPL) and downstream task performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes