NELGMLJan 23, 2018

Flexible Deep Neural Network Processing

arXiv:1801.07353v18 citations
Originality Incremental advance
AI Analysis

This addresses a critical deployment bottleneck for DNNs in data centers and embedded systems, offering a flexible trade-off between quality and runtime, though it appears incremental as it builds on existing network architectures.

The paper tackles the high computational cost of deploying state-of-the-art deep neural networks (DNNs) in ensembles, which is a challenge for platforms with tight latency and energy budgets, by introducing a flexible DNN ensemble processing technique that achieves large reductions in average inference latency with small to negligible accuracy drop.

The recent success of Deep Neural Networks (DNNs) has drastically improved the state of the art for many application domains. While achieving high accuracy performance, deploying state-of-the-art DNNs is a challenge since they typically require billions of expensive arithmetic computations. In addition, DNNs are typically deployed in ensemble to boost accuracy performance, which further exacerbates the system requirements. This computational overhead is an issue for many platforms, e.g. data centers and embedded systems, with tight latency and energy budgets. In this article, we introduce flexible DNNs ensemble processing technique, which achieves large reduction in average inference latency while incurring small to negligible accuracy drop. Our technique is flexible in that it allows for dynamic adaptation between quality of results (QoR) and execution runtime. We demonstrate the effectiveness of the technique on AlexNet and ResNet-50 using the ImageNet dataset. This technique can also easily handle other types of networks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes