LGAICVAug 28, 2021

DKM: Differentiable K-Means Clustering Layer for Neural Network Compression

arXiv:2108.12659v439 citations
Originality Highly original
AI Analysis

This addresses the need for reducing memory requirements and keeping data on-device in AI applications, representing a novel method rather than an incremental improvement.

The paper tackles the problem of deep neural network compression for efficient on-device inference by proposing a differentiable k-means clustering layer (DKM) that enables joint optimization of parameters and centroids, achieving superior compression-accuracy trade-offs, such as 74.5% top-1 accuracy on ResNet50 with a 29.4x compression factor and 6.8% higher accuracy with 33% smaller size than state-of-the-art methods.

Deep neural network (DNN) model compression for efficient on-device inference is becoming increasingly important to reduce memory requirements and keep user data on-device. To this end, we propose a novel differentiable k-means clustering layer (DKM) and its application to train-time weight clustering-based DNN model compression. DKM casts k-means clustering as an attention problem and enables joint optimization of the DNN parameters and clustering centroids. Unlike prior works that rely on additional regularizers and parameters, DKM-based compression keeps the original loss function and model architecture fixed. We evaluated DKM-based compression on various DNN models for computer vision and natural language processing (NLP) tasks. Our results demonstrate that DKM delivers superior compression and accuracy trade-off on ImageNet1k and GLUE benchmarks. For example, DKM-based compression can offer 74.5% top-1 ImageNet1k accuracy on ResNet50 DNN model with 3.3MB model size (29.4x model compression factor). For MobileNet-v1, which is a challenging DNN to compress, DKM delivers 63.9% top-1 ImageNet1k accuracy with 0.72 MB model size (22.4x model compression factor). This result is 6.8% higher top-1accuracy and 33% relatively smaller model size than the current state-of-the-art DNN compression algorithms. Additionally, DKM enables compression of DistilBERT model by 11.8x with minimal (1.1%) accuracy loss on GLUE NLP benchmarks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes