The Power of Training: How Different Neural Network Setups Influence the Energy Demand
This work raises awareness about the energy impact of ML training for practitioners and researchers, though it is incremental in its heuristic evaluation.
The study evaluated how different neural network training setups and hyperparameters affect energy consumption on HPC hardware, finding that suboptimal configurations can use up to 5 times more energy than optimal ones to achieve the same accuracy.
This work offers a heuristic evaluation of the effects of variations in machine learning training regimes and learning paradigms on the energy consumption of computing, especially HPC hardware with a life-cycle aware perspective. While increasing data availability and innovation in high-performance hardware fuels the training of sophisticated models, it also fosters the fading perception of energy consumption and carbon emission. Therefore, the goal of this work is to raise awareness about the energy impact of general training parameters and processes, from learning rate over batch size to knowledge transfer. Multiple setups with different hyperparameter configurations are evaluated on three different hardware systems. Among many results, we have found out that even with the same model and hardware to reach the same accuracy, improperly set training hyperparameters consume up to 5 times the energy of the optimal setup. We also extensively examined the energy-saving benefits of learning paradigms including recycling knowledge through pretraining and sharing knowledge through multitask training.