LGAINEMLApr 13, 2020

Technical Report: NEMO DNN Quantization for Deployment Model

arXiv:2004.05930v119 citations
Originality Synthesis-oriented
AI Analysis

This work addresses deployment challenges for practitioners in machine learning, though it appears incremental as it builds on existing quantization methods.

The paper tackles the problem of deploying deep neural networks efficiently by proposing a formal framework for layer-wise quantization, specifically introducing an IntegerDeployable representation that enables inference using purely integers without real-valued numbers or fixed-point representation.

This technical report aims at defining a formal framework for Deep Neural Network (DNN) layer-wise quantization, focusing in particular on the problems related to the final deployment. It also acts as a documentation for the NEMO (NEural Minimization for pytOrch) framework. It describes the four DNN representations used in NEMO (FullPrecision, FakeQuantized, QuantizedDeployable and IntegerDeployable), focusing in particular on a formal definition of the latter two. An important feature of this model, and in particular the IntegerDeployable representation, is that it enables DNN inference using purely integers - without resorting to real-valued numbers in any part of the computation and without relying on an explicit fixed-point numerical representation.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes