Walking Noise: On Layer-Specific Robustness of Neural Architectures against Noisy Computations and Associated Characteristic Learning Dynamics
This work addresses the challenge of ensuring functionally correct results in neural networks when using noisy, energy-efficient computing technologies, which is an incremental step in optimizing for robustness in specific hardware contexts.
The paper tackles the problem of noisy computations in neural networks, which arise from energy-efficient alternative computing technologies, by proposing a methodology called Walking Noise to inject layer-specific noise and measure robustness. The results show that noisy training significantly increases robustness for all noise types, with additive noise improving the signal-to-noise ratio and multiplicative noise leading to self-binarization of parameters for extreme robustness.
Deep neural networks are extremely successful in various applications, however they exhibit high computational demands and energy consumption. This is exacerbated by stuttering technology scaling, prompting the need for novel approaches to handle increasingly complex neural architectures. At the same time, alternative computing technologies such as analog computing, which promise groundbreaking improvements in energy efficiency, are inevitably fraught with noise and inaccurate calculations. Such noisy computations are more energy efficient, and, given a fixed power budget, also more time efficient. However, like any kind of unsafe optimization, they require countermeasures to ensure functionally correct results. This work considers noisy computations in an abstract form, and gears to understand the implications of such noise on the accuracy of neural network classifiers as an exemplary workload. We propose a methodology called Walking Noise which injects layer-specific noise to measure the robustness and to provide insights on the learning dynamics. In more detail, we investigate the implications of additive, multiplicative and mixed noise for different classification tasks and model architectures. While noisy training significantly increases robustness for all noise types, we observe in particular that it results in increased weight magnitudes and thus inherently improves the signal-to-noise ratio for additive noise injection. Contrarily, training with multiplicative noise can lead to a form of self-binarization of the model parameters, leading to extreme robustness. We conclude with a discussion of the use of this methodology in practice, among others, discussing its use for tailored multi-execution in noisy environments.