LGMLSep 24, 2018

No Multiplication? No Floating Point? No Problem! Training Networks for Efficient Inference

arXiv:1809.09244v28 citations
AI Analysis

This enables efficient AI deployment on devices like hearing aids and wearables, though it is incremental as it builds on existing discretization techniques.

The paper tackled the problem of deploying deep neural networks on resource-constrained devices by avoiding floating-point operations and multiplications during inference, achieving memory usage less than one-third of equivalent floating-point networks without performance loss on tasks like auto-encoding and ImageNet classification.

For successful deployment of deep neural networks on highly--resource-constrained devices (hearing aids, earbuds, wearables), we must simplify the types of operations and the memory/power resources used during inference. Completely avoiding inference-time floating-point operations is one of the simplest ways to design networks for these highly-constrained environments. By discretizing both our in-network non-linearities and our network weights, we can move to simple, compact networks without floating point operations, without multiplications, and avoid all non-linear function computations. Our approach allows us to explore the spectrum of possible networks, ranging from fully continuous versions down to networks with bi-level weights and activations. Our results show that discretization can be done without loss of performance and that we can train a network that will successfully operate without floating-point, without multiplication, and with less RAM on both regression tasks (auto encoding) and multi-class classification tasks (ImageNet). The memory needed to deploy our discretized networks is less than one third of the equivalent architecture that does use floating-point operations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes