Dual Precision Deep Neural Network
This work addresses the need for flexible accuracy-complexity trade-offs during DNN inference, which is incremental as it builds on existing precision scaling methods.
The paper tackles the problem of on-line precision scalability in deep neural networks by proposing a dual-precision DNN that includes two precision modes in a single model, enabling precision switching without re-training, and achieves this through a two-phase training process that optimizes both low- and high-precision modes.
On-line Precision scalability of the deep neural networks(DNNs) is a critical feature to support accuracy and complexity trade-off during the DNN inference. In this paper, we propose dual-precision DNN that includes two different precision modes in a single model, thereby supporting an on-line precision switch without re-training. The proposed two-phase training process optimizes both low- and high-precision modes.