LGAINEJan 25, 2025

On Accelerating Edge AI: Optimizing Resource-Constrained Environments

arXiv:2501.15014v220 citationsh-index: 12
Originality Synthesis-oriented
AI Analysis

It addresses the challenge of accelerating deep learning for edge AI deployments, but as a survey, it is incremental in summarizing existing methods rather than introducing new ones.

This survey tackles the problem of deploying AI on resource-constrained edge devices by reviewing strategies like model compression, Neural Architecture Search, and compiler frameworks to balance performance with compute, memory, and energy limitations, achieving goals such as latency reduction and energy efficiency while maintaining competitive accuracy.

Resource-constrained edge deployments demand AI solutions that balance high performance with stringent compute, memory, and energy limitations. In this survey, we present a comprehensive overview of the primary strategies for accelerating deep learning models under such constraints. First, we examine model compression techniques-pruning, quantization, tensor decomposition, and knowledge distillation-that streamline large models into smaller, faster, and more efficient variants. Next, we explore Neural Architecture Search (NAS), a class of automated methods that discover architectures inherently optimized for particular tasks and hardware budgets. We then discuss compiler and deployment frameworks, such as TVM, TensorRT, and OpenVINO, which provide hardware-tailored optimizations at inference time. By integrating these three pillars into unified pipelines, practitioners can achieve multi-objective goals, including latency reduction, memory savings, and energy efficiency-all while maintaining competitive accuracy. We also highlight emerging frontiers in hierarchical NAS, neurosymbolic approaches, and advanced distillation tailored to large language models, underscoring open challenges like pre-training pruning for massive networks. Our survey offers practical insights, identifies current research gaps, and outlines promising directions for building scalable, platform-independent frameworks to accelerate deep learning models at the edge.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes