Learning Frequency Domain Approximation for Binary Neural Networks
This addresses the problem of training BNNs more effectively for researchers and practitioners in efficient deep learning, though it is incremental as it builds on existing gradient approximation techniques.
The paper tackles the optimization difficulty in binary neural networks (BNNs) by proposing a frequency domain approximation (FDA) method to estimate gradients, achieving state-of-the-art accuracy on benchmark datasets.
Binary neural networks (BNNs) represent original full-precision weights and activations into 1-bit with sign function. Since the gradient of the conventional sign function is almost zero everywhere which cannot be used for back-propagation, several attempts have been proposed to alleviate the optimization difficulty by using approximate gradient. However, those approximations corrupt the main direction of factual gradient. To this end, we propose to estimate the gradient of sign function in the Fourier frequency domain using the combination of sine functions for training BNNs, namely frequency domain approximation (FDA). The proposed approach does not affect the low-frequency information of the original sign function which occupies most of the overall energy, and high-frequency coefficients will be ignored to avoid the huge computational overhead. In addition, we embed a noise adaptation module into the training phase to compensate the approximation error. The experiments on several benchmark datasets and neural architectures illustrate that the binary network learned using our method achieves the state-of-the-art accuracy. Code will be available at \textit{https://gitee.com/mindspore/models/tree/master/research/cv/FDA-BNN}.