CVJun 17, 2021

Layer Folding: Neural Network Depth Reduction using Activation Linearization

arXiv:2106.09309v223 citations
Originality Incremental advance
AI Analysis

This addresses latency issues for real-time applications on devices with limited computational resources, though it is incremental as it builds on existing network compression techniques.

The paper tackles the problem of reducing neural network depth for resource-constrained devices by proposing a method to remove non-linear activations and fold consecutive linear layers, resulting in shallower networks that maintain performance on datasets like CIFAR-10, CIFAR-100, and ImageNet.

Despite the increasing prevalence of deep neural networks, their applicability in resource-constrained devices is limited due to their computational load. While modern devices exhibit a high level of parallelism, real-time latency is still highly dependent on networks' depth. Although recent works show that below a certain depth, the width of shallower networks must grow exponentially, we presume that neural networks typically exceed this minimal depth to accelerate convergence and incrementally increase accuracy. This motivates us to transform pre-trained deep networks that already exploit such advantages into shallower forms. We propose a method that learns whether non-linear activations can be removed, allowing to fold consecutive linear layers into one. We apply our method to networks pre-trained on CIFAR-10 and CIFAR-100 and find that they can all be transformed into shallower forms that share a similar depth. Finally, we use our method to provide more efficient alternatives to MobileNetV2 and EfficientNet-Lite architectures on the ImageNet classification task.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes