DC AR LGApr 10

MATCHA: Efficient Deployment of Deep Neural Networks on Multi-Accelerator Heterogeneous Edge SoCs

Enrico Russo, Mohamed Amine Hamdi, Alessandro Ottaviano, Francesco Conti, Angelo Garofalo, Daniele Jahier Pagliari, Maurizio Palesi, Luca Benini, Alessio Burrello

arXiv:2604.0912435.7h-index: 23

AI Analysis

This addresses the problem of inefficient DNN deployment on heterogeneous edge hardware for developers and engineers, representing an incremental improvement over existing compilers.

The paper tackles the challenge of deploying deep neural networks on multi-accelerator heterogeneous edge SoCs by presenting MATCHA, a framework that optimizes scheduling and memory allocation, resulting in up to 35% reduction in inference latency on the MLPerf Tiny benchmark.

Deploying DNNs on System-on-Chips (SoC) with multiple heterogeneous acceleration engines is challenging, and the majority of deployment frameworks cannot fully exploit heterogeneity. We present MATCHA, a unified DNN deployment framework that generates highly concurrent schedules for parallel, heterogeneous accelerators and uses constraint programming to optimize L3/L2 memory allocation and scheduling. Using pattern matching, tiling, and mapping across individual HW units enables parallel execution and high accelerator utilization. On the MLPerf Tiny benchmark, using a SoC with two heterogeneous accelerators, MATCHA improves accelerator utilization and reduces inference latency by up to 35% with respect to the the state-of-the-art MATCH compiler.

View on arXiv PDF

Similar