LGFeb 27

GRAIL: Post-hoc Compensation by Linear Reconstruction for Compressed Networks

Wenwu Tang, Dong Wang, Lothar Thiele, Olga Saukh

arXiv:2602.23795v12.71 citationsh-index: 76Has Code

Originality Incremental advance

AI Analysis

This addresses the need for efficient model compression without costly finetuning, making it practical for deployment in resource-constrained settings, though it is incremental as it builds on existing compression methods.

The paper tackles the problem of accuracy degradation in aggressively compressed deep models by proposing GRAIL, a post-hoc compensation method that restores block input-output behavior using a small calibration set, improving accuracy or perplexity across ResNets, ViTs, and LLMs without finetuning.

Structured deep model compression methods are hardware-friendly and substantially reduce memory and inference costs. However, under aggressive compression, the resulting accuracy degradation often necessitates post-compression finetuning, which can be impractical due to missing labeled data or high training cost. We propose post-hoc blockwise compensation, called GRAIL, a simple zero-finetuning step applied after model compression that restores each block's input-output behavior using a small calibration set. The method summarizes hidden activations via a Gram matrix and applies ridge regression to linearly reconstruct the original hidden representation from the reduced one. The resulting reconstruction map is absorbed into the downstream projection weights, while the upstream layer is compressed. The approach is selector-agnostic (Magnitude, Wanda, Gram-based selection, or folding), data-aware (requiring only a few forward passes without gradients or labels), and recovers classic pruning or folding when the Gram matrix is near identity, indicating weak inter-channel correlations. Across ResNets, ViTs, and decoder-only LLMs, GRAIL consistently improves accuracy or perplexity over data-free and data-aware pruning or folding baselines in practical compression regimes, with manageable overhead and no backpropagation. The code is available at https://github.com/TWWinde/GRAIL.

View on arXiv PDF Code

Similar