DCApr 24

Network Edge Inference for Large Language Models: Principles, Techniques, and Opportunities

Zhixiong Chen, Bingjie Zhu, Jiangzhou Wang, Hyundong Shin, Arumugam Nallanathan, Dusit Niyato

arXiv:2604.2290662.1

AI Analysis

For researchers and practitioners seeking to deploy LLMs on resource-constrained edge devices, this survey provides a structured overview of current approaches and future directions.

This survey identifies challenges and reviews recent techniques for deploying large language model inference at the network edge, covering system architectures, model optimization, and resource management.

Large language models (LLMs) have advanced rapidly, emerging as versatile tools across fields thanks to their exceptional language understanding, generation, and reasoning capabilities. However, performing LLM inference at the network edge remains challenging due to their large memory and compute demands. This survey outlines the challenges specific to LLM edge inference and provides a comprehensive overview of recent progress, covering system architectures, model optimization and deployment, and resource management and scheduling. By synthesizing state-of-the-art techniques and mapping future directions, this survey aims to unlock the potential of LLMs in resource-constrained edge environments.

View on arXiv PDF

Similar