LG MLNov 23, 2019

Compressing Representations for Embedded Deep Learning

Juliano S. Assine, Alan Godoy, Eduardo Valle

arXiv:1911.10321v12.73 citations

Originality Incremental advance

AI Analysis

This addresses the problem of high computational demands for deep learning on embedded devices, offering a practical solution for mobile and IoT applications, though it is incremental as it builds on existing architectures like MobileNetV2.

The paper tackles the challenge of enabling deep learning on embedded devices by proposing a distributed inference approach that splits computation between local devices and the cloud, using compression to reduce communication costs. It shows that an optimal splitting layer can be found with a PCA-based scheme, achieving a balance between computation, bandwidth, and accuracy.

Despite recent advances in architectures for mobile devices, deep learning computational requirements remains prohibitive for most embedded devices. To address that issue, we envision sharing the computational costs of inference between local devices and the cloud, taking advantage of the compression performed by the first layers of the networks to reduce communication costs. Inference in such distributed setting would allow new applications, but requires balancing a triple trade-off between computation cost, communication bandwidth, and model accuracy. We explore that trade-off by studying the compressibility of representations at different stages of MobileNetV2, showing those results agree with theoretical intuitions about deep learning, and that an optimal splitting layer for network can be found with a simple PCA-based compression scheme.

View on arXiv PDF

Similar