LGDec 12, 2023

IDKM: Memory Efficient Neural Network Quantization via Implicit, Differentiable k-Means

Sean Jaffe, Ambuj K. Singh, Francesco Bullo

arXiv:2312.07759v22.0h-index: 2

Originality Incremental advance

AI Analysis

This work addresses memory constraints for deploying compressed neural networks on edge devices, offering an incremental improvement over existing quantization methods.

The paper tackles the memory inefficiency of differentiable k-means (DKM) for neural network quantization by proposing an implicit version (IDKM) that reduces memory complexity from O(t * m * 2^b) to O(m * 2^b), achieving comparable performance with less compute and memory, and enabling training on hardware where DKM fails.

Compressing large neural networks with minimal performance loss is crucial to enabling their deployment on edge devices. (Cho et al., 2022) proposed a weight quantization method that uses an attention-based clustering algorithm called differentiable $k$-means (DKM). Despite achieving state-of-the-art results, DKM's performance is constrained by its heavy memory dependency. We propose an implicit, differentiable $k$-means algorithm (IDKM), which eliminates the major memory restriction of DKM. Let $t$ be the number of $k$-means iterations, $m$ be the number of weight-vectors, and $b$ be the number of bits per cluster address. IDKM reduces the overall memory complexity of a single $k$-means layer from $\mathcal{O}(t \cdot m \cdot 2^b)$ to $\mathcal{O}( m \cdot 2^b)$. We also introduce a variant, IDKM with Jacobian-Free-Backpropagation (IDKM-JFB), for which the time complexity of the gradient calculation is independent of $t$ as well. We provide a proof of concept of our methods by showing that, under the same settings, IDKM achieves comparable performance to DKM with less compute time and less memory. We also use IDKM and IDKM-JFB to quantize a large neural network, Resnet18, on hardware where DKM cannot train at all.

View on arXiv PDF

Similar