Quantization in Layer's Input is Matter
This work addresses the challenge of efficient model compression for deployment in resource-constrained environments, though it appears incremental as it builds on existing quantization techniques.
The paper tackles the problem of quantization in neural networks by demonstrating that quantizing layer inputs is more critical for minimizing loss than quantizing parameters, and presents an algorithm based on input quantization error that outperforms Hessian-based mixed precision methods.
In this paper, we will show that the quantization in layer's input is more important than parameters' quantization for loss function. And the algorithm which is based on the layer's input quantization error is better than hessian-based mixed precision layout algorithm.