ARMar 23

Convolutions Predictable Offloading to an Accelerator: Formalization and Optimization

arXiv:2603.2179227.2h-index: 25
AI Analysis

This addresses the challenge of real-time CNN execution on accelerators for applications like embedded systems, but it appears incremental as it builds on existing decomposition methods.

The paper tackles the problem of efficiently offloading CNN computations to accelerators with limited on-chip memory by formalizing sequences of offloading steps and applying this to a state-of-the-art convolution decomposition. It results in a Python-based simulator for analyzing optimal strategies in terms of duration, though no concrete performance numbers are provided.

Convolutional neural networks (CNNs) require a large number of multiply-accumulate (MAC) operations. To meet real-time constraints, they often need to be executed on specialized accelerators composed of an on-chip memory and a processing unit. However, the on-chip memory is often insufficient to store all the data required to compute a CNN layer. Thus, the computation must be performed in several offloading steps. We formalise such sequences of steps and apply our formalism to a state of the art decomposition of convolutions. In order to find optimal strategies in terms of duration, we encode the problem with a set of constraints. A Python-based simulator allows to analyse in-depth computed strategies.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes