MOGNET: A Mux-residual quantized Network leveraging Online-Generated weights
This work addresses the need for efficient deep learning models on hardware with strict memory constraints, representing an incremental improvement in model compression techniques.
The paper tackles the problem of designing compact neural networks for resource-limited hardware by introducing MOGNET, which uses online-generated weights and a multiplexer mechanism for low-precision quantization, achieving up to 1% higher accuracy with sub-2Mb memory compared to state-of-the-art methods.
This paper presents a compact model architecture called MOGNET, compatible with a resource-limited hardware. MOGNET uses a streamlined Convolutional factorization block based on a combination of 2 point-wise (1x1) convolutions with a group-wise convolution in-between. To further limit the overall model size and reduce the on-chip required memory, the second point-wise convolution's parameters are on-line generated by a Cellular Automaton structure. In addition, MOGNET enables the use of low-precision weights and activations, by taking advantage of a Multiplexer mechanism with a proper Bitshift rescaling for integrating residual paths without increasing the hardware-related complexity. To efficiently train this model we also introduce a novel weight ternarization method favoring the balance between quantized levels. Experimental results show that given tiny memory budget (sub-2Mb), MOGNET can achieve higher accuracy with a clear gap up to 1% at a similar or even lower model size compared to recent state-of-the-art methods.