CL AIMar 17

Frequency Matters: Fast Model-Agnostic Data Curation for Pruning and Quantization

Francesco Pio Monaco, Elia Cunegatti, Flavio Vella, Giovanni Iacca

arXiv:2603.1610569.01 citationsh-index: 4

Predicted impact top 92% in CL · last 90 daysOriginality Incremental advance

AI Analysis

This work addresses the critical but often overlooked step of data curation for model compression, benefiting practitioners by providing a fast and effective solution for pruning and quantization, though it is incremental as it builds on existing compression techniques.

The paper tackles the problem of selecting optimal calibration data for post-training compression of Large Language Models (LLMs) by introducing ZipCal, a model-agnostic strategy based on Zipfian power laws to maximize lexical diversity. The result shows that ZipCal consistently outperforms uniform random sampling in pruning benchmarks, matches state-of-the-art model perplexity methods in downstream performance, and is ~240× faster due to its linear complexity.

Post-training model compression is essential for enhancing the portability of Large Language Models (LLMs) while preserving their performance. While several compression approaches have been proposed, less emphasis has been placed on selecting the most suitable set of data (the so-called \emph{calibration data}) for finding the compressed model configuration. The choice of calibration data is a critical step in preserving model capabilities both intra- and inter-tasks. In this work, we address the challenge of identifying high-performance calibration sets for both pruning and quantization by analyzing intrinsic data properties rather than model-specific signals. We introduce \texttt{\textbf{ZipCal}}, a model-agnostic data curation strategy that maximizes lexical diversity based on Zipfian power laws. Experiments demonstrate that our method consistently outperforms standard uniform random sampling across various pruning benchmarks. Notably, it also performs on par, in terms of downstream performance, with a state-of-the-art method that relies on model perplexity. The latter becomes prohibitively expensive at large-scale models and datasets, while \texttt{\textbf{ZipCal}} is on average $\sim$240$\times$ faster due to its tractable linear complexity\footnote{We make the code and the experiments available at https://anonymous.4open.science/r/zipcal-71CD/.}.

View on arXiv PDF

Similar