LGAIFeb 20

Cut Less, Fold More: Model Compression through the Lens of Projection Geometry

arXiv:2602.18116v11 citations
Originality Incremental advance
AI Analysis

This provides a geometry-aware, calibration-free compression method for deploying neural networks at scale, though it is incremental as it builds on existing pruning and clustering techniques.

The paper tackles neural network compression without retraining by comparing structured pruning and model folding through projection geometry, showing that folding typically achieves higher post-compression accuracy, with gains up to moderate-high compression levels across diverse models and datasets.

Compressing neural networks without retraining is vital for deployment at scale. We study calibration-free compression through the lens of projection geometry: structured pruning is an axis-aligned projection, whereas model folding performs a low-rank projection via weight clustering. We formalize both as orthogonal operators and show that, within a rank distance of one, folding provably yields smaller parameter reconstruction error, and under mild smoothness assumptions, smaller functional perturbations than pruning. At scale, we evaluate >1000 checkpoints spanning ResNet18, PreActResNet18, ViT-B/32, and CLIP ViT-B/32 on CIFAR-10 and ImageNet-1K, covering diverse training hyperparameters (optimizers, learning rates, augmentations, regularization, sharpness-aware training), as well as multiple LLaMA-family 60M and 130M parameter models trained on C4. We show that folding typically achieves higher post-compression accuracy, with the largest gains at moderate-high compression. The gap narrows and occasionally reverses at specific training setups. Our results position folding as a geometry-aware, calibration-free alternative to pruning that is often superior in practice and principled in theory.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes