LG AI NE MLApr 13, 2020

Technical Report: NEMO DNN Quantization for Deployment Model

arXiv:2004.05930v110.119 citations

Originality Synthesis-oriented

AI Analysis

This work addresses deployment challenges for practitioners in machine learning, though it appears incremental as it builds on existing quantization methods.

The paper tackles the problem of deploying deep neural networks efficiently by proposing a formal framework for layer-wise quantization, specifically introducing an IntegerDeployable representation that enables inference using purely integers without real-valued numbers or fixed-point representation.

This technical report aims at defining a formal framework for Deep Neural Network (DNN) layer-wise quantization, focusing in particular on the problems related to the final deployment. It also acts as a documentation for the NEMO (NEural Minimization for pytOrch) framework. It describes the four DNN representations used in NEMO (FullPrecision, FakeQuantized, QuantizedDeployable and IntegerDeployable), focusing in particular on a formal definition of the latter two. An important feature of this model, and in particular the IntegerDeployable representation, is that it enables DNN inference using purely integers - without resorting to real-valued numbers in any part of the computation and without relying on an explicit fixed-point numerical representation.

View on arXiv PDF

Similar