LGJan 28, 2023

Efficient Latency-Aware CNN Depth Compression via Two-Stage Dynamic Programming

arXiv:2301.12187v22 citationsh-index: 22
Originality Incremental advance
AI Analysis

This work addresses the need for faster and more efficient CNN inference for applications like mobile and edge computing, though it is incremental as it builds on prior depth compression methods.

The paper tackles the problem of reducing CNN inference latency by proposing a depth compression algorithm that merges convolution layers and replaces inefficient activations with identity functions, achieving a 1.41x speed-up with a 0.11% accuracy gain on MobileNetV2-1.0 on ImageNet.

Recent works on neural network pruning advocate that reducing the depth of the network is more effective in reducing run-time memory usage and accelerating inference latency than reducing the width of the network through channel pruning. In this regard, some recent works propose depth compression algorithms that merge convolution layers. However, the existing algorithms have a constricted search space and rely on human-engineered heuristics. In this paper, we propose a novel depth compression algorithm which targets general convolution operations. We propose a subset selection problem that replaces inefficient activation layers with identity functions and optimally merges consecutive convolution operations into shallow equivalent convolution operations for efficient end-to-end inference latency. Since the proposed subset selection problem is NP-hard, we formulate a surrogate optimization problem that can be solved exactly via two-stage dynamic programming within a few seconds. We evaluate our methods and baselines by TensorRT for a fair inference latency comparison. Our method outperforms the baseline method with higher accuracy and faster inference speed in MobileNetV2 on the ImageNet dataset. Specifically, we achieve $1.41\times$ speed-up with $0.11$\%p accuracy gain in MobileNetV2-1.0 on the ImageNet.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes