CVAIAug 18, 2025

edgeVLM: Cloud-edge Collaborative Real-time VLM based on Context Transfer

arXiv:2508.12638v2h-index: 3
Originality Incremental advance
AI Analysis

This addresses latency-aware collaboration for real-time VLM systems like autonomous driving, though it appears incremental as it builds on existing cloud-edge architectures.

The paper tackles the problem of latency fluctuations in cloud-edge collaborative Vision-Language Models (VLMs) for real-time applications by proposing a Context Transfer paradigm that uses delayed LVLM outputs as historical context to guide SVLMs, achieving effectiveness across three tasks and four datasets.

Vision-Language Models (VLMs) are increasingly deployed in real-time applications such as autonomous driving and human-computer interaction, which demand fast and reliable responses based on accurate perception. To meet these requirements, existing systems commonly employ cloud-edge collaborative architectures, such as partitioned Large Vision-Language Models (LVLMs) or task offloading strategies between Large and Small Vision-Language Models (SVLMs). However, these methods fail to accommodate cloud latency fluctuations and overlook the full potential of delayed but accurate LVLM responses. In this work, we propose a novel cloud-edge collaborative paradigm for VLMs, termed Context Transfer, which treats the delayed outputs of LVLMs as historical context to provide real-time guidance for SVLMs inference. Based on this paradigm, we design edgeVLM, which incorporates both context replacement and visual focus modules to refine historical textual input and enhance visual grounding consistency. Extensive experiments on three real-time vision-lanuage reasoning tasks across four datasets demonstrate the effectiveness of the proposed framework. The new paradigm lays the groundwork for more effective and latency-aware collaboration strategies in future VLM systems.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes