Motion-Compensated Weight Compression

arXiv:2605.2475411.5Has Code

AI Analysis

For practitioners deploying large neural networks, MCWC reduces storage and bandwidth costs without sacrificing accuracy, offering a practical improvement over existing weight compression methods.

MCWC improves weight compression by exploiting cross-layer redundancy via permutation alignment, achieving better rate-accuracy trade-offs than quantization and learned codec baselines across Transformer and vision models.

Neural network weights are increasingly a bottleneck for deployment, yet most compression pipelines treat layers independently and overlook cross-layer redundancy induced by function-preserving symmetries. We propose Motion-Compensated Weight Compression (MCWC), a weight-only codec that aligns permutation-symmetric blocks (e.g., hidden units and attention heads) to maximize cross-layer correspondence, turning depth into a predictable sequence. In the aligned coordinate system, MCWC uses a lightweight layer-sequential predictor with periodic keyframes and encodes only quantized prediction residuals using a learned entropy model trained under a rate distortion objective. A simple decoder reconstructs deployable weights by entropy decoding, dequantization, predictor-driven reconstruction, and inverse alignment, enabling fast weight materialization for inference. Across Transformer language modeling and vision classification, MCWC improves the rate accuracy Pareto frontier over strong quantization and learned weight-codec baselines, while maintaining competitive decode time. Ablations confirm that alignment, prediction, entropy modeling, and keyframe scheduling are each necessary for the full gains. Our code is available via https://github.com/Ism-ail11/MCWC.

View on arXiv PDF Code

Similar