CV LGSep 25, 2019

CAT: Compression-Aware Training for bandwidth reduction

Chaim Baskin, Brian Chmiel, Evgenii Zheltonozhskii, Ron Banner, Alex M. Bronstein, Avi Mendelson

arXiv:1909.11481v110.214 citationsHas Code

Originality Highly original

AI Analysis

This addresses bandwidth reduction for hardware accelerators in visual processing tasks, offering a novel training approach for better compression.

The paper tackles the problem of high memory bandwidth requirements in convolutional neural networks (CNNs) for inference by proposing a compression-aware training (CAT) method, achieving 73.1% accuracy with only 1.79 bits per value on ResNet-34, a 0.2% degradation from baseline.

Convolutional neural networks (CNNs) have become the dominant neural network architecture for solving visual processing tasks. One of the major obstacles hindering the ubiquitous use of CNNs for inference is their relatively high memory bandwidth requirements, which can be a main energy consumer and throughput bottleneck in hardware accelerators. Accordingly, an efficient feature map compression method can result in substantial performance gains. Inspired by quantization-aware training approaches, we propose a compression-aware training (CAT) method that involves training the model in a way that allows better compression of feature maps during inference. Our method trains the model to achieve low-entropy feature maps, which enables efficient compression at inference time using classical transform coding methods. CAT significantly improves the state-of-the-art results reported for quantization. For example, on ResNet-34 we achieve 73.1% accuracy (0.2% degradation from the baseline) with an average representation of only 1.79 bits per value. Reference implementation accompanies the paper at https://github.com/CAT-teams/CAT

View on arXiv PDF Code

Similar