LGMLNov 23, 2019

Compressing Representations for Embedded Deep Learning

arXiv:1911.10321v13 citations
Originality Incremental advance
AI Analysis

This addresses the problem of high computational demands for deep learning on embedded devices, offering a practical solution for mobile and IoT applications, though it is incremental as it builds on existing architectures like MobileNetV2.

The paper tackles the challenge of enabling deep learning on embedded devices by proposing a distributed inference approach that splits computation between local devices and the cloud, using compression to reduce communication costs. It shows that an optimal splitting layer can be found with a PCA-based scheme, achieving a balance between computation, bandwidth, and accuracy.

Despite recent advances in architectures for mobile devices, deep learning computational requirements remains prohibitive for most embedded devices. To address that issue, we envision sharing the computational costs of inference between local devices and the cloud, taking advantage of the compression performed by the first layers of the networks to reduce communication costs. Inference in such distributed setting would allow new applications, but requires balancing a triple trade-off between computation cost, communication bandwidth, and model accuracy. We explore that trade-off by studying the compressibility of representations at different stages of MobileNetV2, showing those results agree with theoretical intuitions about deep learning, and that an optimal splitting layer for network can be found with a simple PCA-based compression scheme.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes