Removing the Feature Correlation Effect of Multiplicative Noise
This work addresses a specific issue in regularization for deep neural networks, offering a better alternative to dropout for batch-normalized networks, which is incremental but practical for improving model training.
The authors tackled the problem that multiplicative noise like dropout increases feature correlation, which is undesirable due to increased redundancy, and proposed non-correlating multiplicative noise (NCMN) that uses batch normalization to remove this effect, significantly improving performance on image classification tasks.
Multiplicative noise, including dropout, is widely used to regularize deep neural networks (DNNs), and is shown to be effective in a wide range of architectures and tasks. From an information perspective, we consider injecting multiplicative noise into a DNN as training the network to solve the task with noisy information pathways, which leads to the observation that multiplicative noise tends to increase the correlation between features, so as to increase the signal-to-noise ratio of information pathways. However, high feature correlation is undesirable, as it increases redundancy in representations. In this work, we propose non-correlating multiplicative noise (NCMN), which exploits batch normalization to remove the correlation effect in a simple yet effective way. We show that NCMN significantly improves the performance of standard multiplicative noise on image classification tasks, providing a better alternative to dropout for batch-normalized networks. Additionally, we present a unified view of NCMN and shake-shake regularization, which explains the performance gain of the latter.