Energy Consumption of Neural Networks on NVIDIA Edge Boards: an Empirical Model
This work addresses the need for energy-efficient machine learning at the edge, which is crucial for applications like IoT and mobile devices, but it is incremental as it builds on existing profiling efforts with new empirical data.
The paper tackled the problem of predicting energy consumption for neural network inference on edge hardware by developing an empirical model based on measurements from NVIDIA Jetson TX2 and Xavier boards, resulting in a simple, practical model that estimates energy use for convolutional and fully connected layers.
Recently, there has been a trend of shifting the execution of deep learning inference tasks toward the edge of the network, closer to the user, to reduce latency and preserve data privacy. At the same time, growing interest is being devoted to the energetic sustainability of machine learning. At the intersection of these trends, we hence find the energetic characterization of machine learning at the edge, which is attracting increasing attention. Unfortunately, calculating the energy consumption of a given neural network during inference is complicated by the heterogeneity of the possible underlying hardware implementation. In this work, we hence aim at profiling the energetic consumption of inference tasks for some modern edge nodes and deriving simple but realistic models. To this end, we performed a large number of experiments to collect the energy consumption of convolutional and fully connected layers on two well-known edge boards by NVIDIA, namely Jetson TX2 and Xavier. From the measurements, we have then distilled a simple, practical model that can provide an estimate of the energy consumption of a certain inference task on the considered boards. We believe that this model can be used in many contexts as, for instance, to guide the search for efficient architectures in Neural Architecture Search, as a heuristic in Neural Network pruning, or to find energy-efficient offloading strategies in a Split computing context, or simply to evaluate the energetic performance of Deep Neural Network architectures.