Post-Training BatchNorm Recalibration
This addresses accuracy loss in hardware-accelerated deep learning for practitioners using NB-SMT, but it is incremental as it builds on prior work.
The paper tackles the performance degradation in non-blocking simultaneous multithreading (NB-SMT) due to computational noise by proposing post-training recalibration of batch normalization statistics, resulting in substantial model performance recovery.
We revisit non-blocking simultaneous multithreading (NB-SMT) introduced previously by Shomron and Weiser (2020). NB-SMT trades accuracy for performance by occasionally "squeezing" more than one thread into a shared multiply-and-accumulate (MAC) unit. However, the method of accommodating more than one thread in a shared MAC unit may contribute noise to the computations, thereby changing the internal statistics of the model. We show that substantial model performance can be recouped by post-training recalibration of the batch normalization layers' running mean and running variance statistics, given the presence of NB-SMT.