DCApr 3

MSAO: Adaptive Modality Sparsity-Aware Offloading with Edge-Cloud Collaboration for Efficient Multimodal LLM Inference

arXiv:2604.0294574.8
Predicted impact top 9% in DC · last 90 daysOriginality Incremental advance
AI Analysis

This addresses efficiency problems for deploying multimodal LLMs on edge devices, representing an incremental improvement through a novel hybrid method.

The paper tackles the computational and latency challenges of deploying multimodal large language models on resource-constrained edge devices by proposing MSAO, an adaptive offloading framework with edge-cloud collaboration, which achieves a 30% reduction in end-to-end latency, 30%-65% decrease in resource overhead, and 1.5x to 2.3x throughput improvement while maintaining competitive accuracy.

Multimodal large language models (MLLMs) enable powerful cross-modal reasoning capabilities but impose substantial computational and latency burdens, posing critical challenges for deployment on resource-constrained edge devices. In this paper, we propose MSAO, an adaptive modality sparsity-aware offloading framework with edge-cloud collaboration for efficient MLLM Inference. First, a lightweight heterogeneous modality-aware via fine-grained sparsity module performs spatial-temporal-modal joint analysis to compute the Modality Activation Sparsity (MAS) metric, which quantifies the necessity of each modality with minimal computational overhead. Second, an adaptive speculative edge-cloud collaborative offloading mechanism dynamically schedules workloads between edge and cloud based on the derived MAS scores and real-time system states, leveraging confidence-guided speculative execution to hide communication latency. Extensive experiments on VQAv2 and MMBench benchmarks demonstrate that MSAO achieves a 30% reduction in end-to-end latency and 30%-65% decrease in resource overhead, while delivering a throughput improvement of 1.5x to 2.3x compared to traditional approaches, all without compromising competitive accuracy.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes