LGJul 20, 2022

Mixed-Precision Inference Quantization: Radically Towards Faster inference speed, Lower Storage requirement, and Lower Loss

arXiv:2207.10083v11.8h-index: 37

Originality Incremental advance

AI Analysis

This addresses the challenge of reducing storage and improving inference speed in neural networks while maintaining or improving accuracy, though it appears incremental as it builds on existing quantization techniques.

The paper tackles the problem of model quantization causing increased loss compared to full precision models by proposing a mixed-precision quantization method that achieves lower loss than full precision models, with analysis showing layer input noise primarily affects the loss function.

Based on the model's resilience to computational noise, model quantization is important for compressing models and improving computing speed. Existing quantization techniques rely heavily on experience and "fine-tuning" skills. In the majority of instances, the quantization model has a larger loss than a full precision model. This study provides a methodology for acquiring a mixed-precise quantization model with a lower loss than the full precision model. In addition, the analysis demonstrates that, throughout the inference process, the loss function is mostly affected by the noise of the layer inputs. In particular, we will demonstrate that neural networks with massive identity mappings are resistant to the quantization method. It is also difficult to improve the performance of these networks using quantization.

View on arXiv PDF

Similar