DCAINIPFNov 18, 2024

Generative AI on the Edge: Architecture and Performance Evaluation

arXiv:2411.17712v112 citationsh-index: 6ICC 2025 - IEEE International Conference on Communications
Originality Synthesis-oriented
AI Analysis

It addresses the need for localized AI inference in remote or bandwidth-constrained 6G environments, but is incremental as it applies existing methods to new hardware setups.

This research tackled the problem of evaluating Generative AI models on edge devices for 6G networks by testing LLM inference on a Raspberry Pi cluster, finding that lightweight models like Yi, Phi, and Llama3 achieve 5 to 12 tokens per second with under 50% CPU and RAM usage.

6G's AI native vision of embedding advance intelligence in the network while bringing it closer to the user requires a systematic evaluation of Generative AI (GenAI) models on edge devices. Rapidly emerging solutions based on Open RAN (ORAN) and Network-in-a-Box strongly advocate the use of low-cost, off-the-shelf components for simpler and efficient deployment, e.g., in provisioning rural connectivity. In this context, conceptual architecture, hardware testbeds and precise performance quantification of Large Language Models (LLMs) on off-the-shelf edge devices remains largely unexplored. This research investigates computationally demanding LLM inference on a single commodity Raspberry Pi serving as an edge testbed for ORAN. We investigate various LLMs, including small, medium and large models, on a Raspberry Pi 5 Cluster using a lightweight Kubernetes distribution (K3s) with modular prompting implementation. We study its feasibility and limitations by analyzing throughput, latency, accuracy and efficiency. Our findings indicate that CPU-only deployment of lightweight models, such as Yi, Phi, and Llama3, can effectively support edge applications, achieving a generation throughput of 5 to 12 tokens per second with less than 50\% CPU and RAM usage. We conclude that GenAI on the edge offers localized inference in remote or bandwidth-constrained environments in 6G networks without reliance on cloud infrastructure.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes