LGAICVJan 5, 2025

PTEENet: Post-Trained Early-Exit Neural Networks Augmentation for Inference Cost Optimization

arXiv:2501.02508v111 citationsh-index: 6IEEE Access
Originality Incremental advance
AI Analysis

This work addresses inference cost optimization for practical applications where computational resources are limited, but it is incremental as it builds on existing early-exit methods.

The authors tackled the problem of high computational cost in deep neural network inference by introducing a method to add early-exit branches to pre-trained models, enabling real-time control over the trade-off between speed and accuracy. Their results showed a reduction in average inference computational cost on image datasets like SVHN and CIFAR10 with architectures such as ResNet, DenseNet, and VGG.

For many practical applications, a high computational cost of inference over deep network architectures might be unacceptable. A small degradation in the overall inference accuracy might be a reasonable price to pay for a significant reduction in the required computational resources. In this work, we describe a method for introducing "shortcuts" into the DNN feedforward inference process by skipping costly feedforward computations whenever possible. The proposed method is based on the previously described BranchyNet (Teerapittayanon et al., 2016) and the EEnet (Demir, 2019) architectures that jointly train the main network and early exit branches. We extend those methods by attaching branches to pre-trained models and, thus, eliminating the need to alter the original weights of the network. We also suggest a new branch architecture based on convolutional building blocks to allow enough training capacity when applied on large DNNs. The proposed architecture includes confidence heads that are used for predicting the confidence level in the corresponding early exits. By defining adjusted thresholds on these confidence extensions, we can control in real-time the amount of data exiting from each branch and the overall tradeoff between speed and accuracy of our model. In our experiments, we evaluate our method using image datasets (SVHN and CIFAR10) and several DNN architectures (ResNet, DenseNet, VGG) with varied depth. Our results demonstrate that the proposed method enables us to reduce the average inference computational cost and further controlling the tradeoff between the model accuracy and the computation cost.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes