CVMar 11, 2025

SSVQ: Unleashing the Potential of Vector Quantization with Sign-Splitting

arXiv:2503.08668v21 citationsh-index: 14Has Code
Originality Incremental advance
AI Analysis

This addresses a specific bottleneck in weight compression for machine learning models, offering improved fine-tuning performance and hardware efficiency, though it is an incremental advancement over existing VQ methods.

The paper tackles the limitation of Vector Quantization (VQ) during fine-tuning, where weight vectors assigned to the same codeword are forced to update in the same direction, often contrary to local gradient information, by introducing Sign-Splitting VQ (SSVQ), which decouples the sign bit of weights from the codebook and jointly optimizes signs and codebook, resulting in a significantly superior compression-accuracy trade-off and a 3× speedup over 8-bit compressed models on a hardware accelerator.

Vector Quantization (VQ) has emerged as a prominent weight compression technique, showcasing substantially lower quantization errors than uniform quantization across diverse models, particularly in extreme compression scenarios. However, its efficacy during fine-tuning is limited by the constraint of the compression format, where weight vectors assigned to the same codeword are restricted to updates in the same direction. Consequently, many quantized weights are compelled to move in directions contrary to their local gradient information. To mitigate this issue, we introduce a novel VQ paradigm, Sign-Splitting VQ (SSVQ), which decouples the sign bit of weights from the codebook. Our approach involves extracting the sign bits of uncompressed weights and performing clustering and compression on all-positive weights. We then introduce latent variables for the sign bit and jointly optimize both the signs and the codebook. Additionally, we implement a progressive freezing strategy for the learnable sign to ensure training stability. Extensive experiments on various modern models and tasks demonstrate that SSVQ achieves a significantly superior compression-accuracy trade-off compared to conventional VQ. Furthermore, we validate our algorithm on a hardware accelerator, showing that SSVQ achieves a 3$\times$ speedup over the 8-bit compressed model by reducing memory access. Our code is available at https://github.com/list0830/SSVQ.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes