ITAILGAug 15, 2025

CSGO: Generalized Optimization for Cold Start in Wireless Collaborative Edge LLM Systems

arXiv:2508.11287v11 citationsh-index: 11J Commun Inf Netw
Originality Incremental advance
AI Analysis

This work addresses the challenge of deploying large language models on resource-constrained edge devices, which is an incremental improvement for edge AI systems.

The paper tackles the problem of cold-start latency in wireless collaborative edge LLM systems by proposing a latency-aware scheduling framework that overlaps model loading with computation and communication, resulting in significant reduction in cold-start latency compared to baseline strategies.

While deploying large language models on edge devices promises low-latency and privacy-preserving AI services, it is hindered by limited device resources. Although pipeline parallelism facilitates distributed inference, existing approaches often ignore the cold-start latency caused by on-demand model loading. In this paper, we propose a latency-aware scheduling framework that overlaps model loading with computation and communication to minimize total inference latency. Based on device and model parameters, the framework dynamically adjusts layer partitioning and allocation to effectively hide loading time, thereby eliminating as many idle periods as possible. We formulate the problem as a Mixed-Integer Non-Linear Program and design an efficient dynamic programming algorithm to optimize model partitioning and device assignment. Experimental results show that the proposed method significantly reduces cold-start latency compared to baseline strategies.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes