Soft Quantization: Model Compression Via Weight Coupling
This provides a new pipeline for flexible compression of machine learning models, addressing the problem of model size reduction for deployment, though it appears incremental as it builds on existing quantization methods.
The paper tackles model compression by introducing short-range attractive couplings between neural network weights during training, which discretizes the weight distribution in a mixed-precision manner and outperforms histogram-equalized post-training quantization on ResNet-20/CIFAR-10.
We show that introducing short-range attractive couplings between the weights of a neural network during training provides a novel avenue for model quantization. These couplings rapidly induce the discretization of a model's weight distribution, and they do so in a mixed-precision manner despite only relying on two additional hyperparameters. We demonstrate that, within an appropriate range of hyperparameters, our "soft quantization'' scheme outperforms histogram-equalized post-training quantization on ResNet-20/CIFAR-10. Soft quantization provides both a new pipeline for the flexible compression of machine learning models and a new tool for investigating the trade-off between compression and generalization in high-dimensional loss landscapes.