AI ARSep 30, 2025

Benchmarking Deep Learning Convolutions on Energy-constrained CPUs

Enrique Galvez, Adrien Cassagne, Alix Munier, Manuel Bouyer

arXiv:2509.26217v13.31 citationsh-index: 17

Originality Synthesis-oriented

AI Analysis

It provides practical guidance for energy-aware embedded deployment, addressing an underoptimized area in CPU implementations.

This work benchmarks state-of-the-art convolution algorithms for CPU-based deep learning inference, focusing on latency and energy efficiency across modern CPUs, and finds that the Nvidia AGX Orin with the GEMM algorithm achieves the best trade-off.

This work evaluates state-of-the-art convolution algorithms for CPU-based deep learning inference. While most prior studies focus on GPUs or NPUs, CPU implementations remain relatively underoptimized. We benchmark direct, GEMM-based, and Winograd convolutions across modern CPUs from ARM __ , Intel __ , AMD __ , Apple __ , and Nvidia __ , considering both latency and energy efficiency. Our results highlight the key architectural factors that govern CPU efficiency for convolution operations, providing practical guidance for energy-aware embedded deployment. As a main results of this work, the Nvidia __ AGX Orin combined with the GEMM algorithm achieves the best trade-off between inference latency and energy consumption.

View on arXiv PDF

Similar