DC DB LGNov 27, 2025

DisCEdge: Distributed Context Management for Large Language Models at the Edge

Mohammadreza Malekabbasi, Minghe Wang, David Bermbach

arXiv:2511.22599v11.21 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses latency and privacy problems for edge-deployed LLM services, offering an incremental improvement over existing context management solutions.

The paper tackles the challenge of managing user context for Large Language Models (LLM) at the edge, where stateless LLMs cause latency and overhead issues; it proposes DisCEdge, a distributed context management system that improves median response times by up to 14.46%, reduces synchronization overhead by up to 15%, and cuts client request sizes by a median of 90% compared to existing methods.

Deploying Large Language Model (LLM) services at the edge benefits latency-sensitive and privacy-aware applications. However, the stateless nature of LLMs makes managing user context (e.g., sessions, preferences) across geo-distributed edge nodes challenging. Existing solutions, such as client-side context storage, often introduce network latency and bandwidth overhead, undermining the advantages of edge deployment. We propose DisCEdge, a distributed context management system that stores and replicates user context in tokenized form across edge nodes. By maintaining context as token sequences rather than raw text, our system avoids redundant computation and enables efficient data replication. We implement and evaluate an open-source prototype in a realistic edge environment with commodity hardware. We show DisCEdge improves median response times by up to 14.46% and lowers median inter-node synchronization overhead by up to 15% compared to a raw-text-based system. It also reduces client request sizes by a median of 90% compared to client-side context management, while guaranteeing data consistency.

View on arXiv PDF

Similar