LGSPMLJun 3, 2020

Communication-Computation Trade-Off in Resource-Constrained Edge Inference

arXiv:2006.02166v2156 citations
AI Analysis

This addresses the problem of efficient AI inference on resource-constrained edge devices, which is incremental as it builds on existing edge computing methods.

The paper tackles the trade-off between computation and communication costs in edge AI inference by proposing a three-step framework for device-edge co-inference, resulting in significantly reduced inference latency compared to baseline methods.

The recent breakthrough in artificial intelligence (AI), especially deep neural networks (DNNs), has affected every branch of science and technology. Particularly, edge AI has been envisioned as a major application scenario to provide DNN-based services at edge devices. This article presents effective methods for edge inference at resource-constrained devices. It focuses on device-edge co-inference, assisted by an edge computing server, and investigates a critical trade-off among the computation cost of the on-device model and the communication cost of forwarding the intermediate feature to the edge server. A three-step framework is proposed for the effective inference: (1) model split point selection to determine the on-device model, (2) communication-aware model compression to reduce the on-device computation and the resulting communication overhead simultaneously, and (3) task-oriented encoding of the intermediate feature to further reduce the communication overhead. Experiments demonstrate that our proposed framework achieves a better trade-off and significantly reduces the inference latency than baseline methods.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes