LGDec 6, 2017

High performance ultra-low-precision convolutions on mobile devices

arXiv:1712.02427v129 citationsHas Code
Originality Incremental advance
AI Analysis

This addresses performance bottlenecks for real-time computer vision on resource-limited mobile devices, representing a strong incremental improvement in optimization.

The paper tackled the problem of computation power constraints for mobile deep learning, especially on older ARMv7 devices, by providing an open-source ultra-low-precision (<4 bit) implementation, achieving speedups of 4x-20x over float32 and int8 baselines.

Many applications of mobile deep learning, especially real-time computer vision workloads, are constrained by computation power. This is particularly true for workloads running on older consumer phones, where a typical device might be powered by a single- or dual-core ARMv7 CPU. We provide an open-source implementation and a comprehensive analysis of (to our knowledge) the state of the art ultra-low-precision (<4 bit precision) implementation of the core primitives required for modern deep learning workloads on ARMv7 devices, and demonstrate speedups of 4x-20x over our additional state-of-the-art float32 and int8 baselines.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes