LG DCAug 26, 2024

Adaptive Resolution Inference (ARI): Energy-Efficient Machine Learning for Internet of Things

Ziheng Wang, Pedro Reviriego, Farzad Niknia, Javier Conde, Shanshan Liu, Fabrizio Lombardi

arXiv:2408.14528v16.44 citationsh-index: 33

Originality Incremental advance

AI Analysis

This addresses energy efficiency for IoT devices, but it is incremental as it builds on existing quantization methods.

The paper tackles the challenge of implementing machine learning on energy-constrained Internet of Things devices by proposing Adaptive Resolution Inference (ARI), which uses reduced precision for most inferences and switches to full precision only when needed, achieving energy savings of 40% to 85% without affecting model performance.

The implementation of machine learning in Internet of Things devices poses significant operational challenges due to limited energy and computation resources. In recent years, significant efforts have been made to implement simplified ML models that can achieve reasonable performance while reducing computation and energy, for example by pruning weights in neural networks, or using reduced precision for the parameters and arithmetic operations. However, this type of approach is limited by the performance of the ML implementation, i.e., by the loss for example in accuracy due to the model simplification. In this article, we present adaptive resolution inference (ARI), a novel approach that enables to evaluate new tradeoffs between energy dissipation and model performance in ML implementations. The main principle of the proposed approach is to run inferences with reduced precision (quantization) and use the margin over the decision threshold to determine if either the result is reliable, or the inference must run with the full model. The rationale is that quantization only introduces small deviations in the inference scores, such that if the scores have a sufficient margin over the decision threshold, it is unlikely that the full model would have a different result. Therefore, we can run the quantized model first, and only when the scores do not have a sufficient margin, the full model is run. This enables most inferences to run with the reduced precision model and only a small fraction requires the full model, so significantly reducing computation and energy while not affecting model performance. The proposed ARI approach is presented, analyzed in detail, and evaluated using different data sets for floating-point and stochastic computing implementations. The results show that ARI can significantly reduce the energy for inference in different configurations with savings between 40% and 85%.

View on arXiv PDF

Similar