DC ROMar 9

RAPID: Redundancy-Aware and Compatibility-Optimal Edge-Cloud Partitioned Inference for Diverse VLA models

Zihao Zheng, Sicheng Tian, Hangyu Cao, Chenyue Li, Jiayu Chen, Maoliang Li, Xinhao Sun, Hailong Zou, Guojie Luo, Xiang Chen

arXiv:2603.07949v113.64 citationsh-index: 2

Predicted impact top 7% in DC · last 90 daysOriginality Incremental advance

AI Analysis

This work provides a significant speedup for VLA model inference, which is crucial for real-time embodied intelligence applications.

This paper addresses the high inference costs of Vision Language Action (VLA) models in embodied intelligence by proposing RAPID, an Edge-Cloud Collaborative (ECC) inference framework. RAPID achieves a speedup of up to 1.73x with only 5-7% overhead by optimizing partitioning for VLA models.

Vision Language Action (VLA) models are mainstream in embodied intelligence but face high inference costs. Edge-Cloud Collaborative (ECC) inference offers an effective fix by easing edge-device computing pressure to meet real-time needs. However, existing ECC frameworks are suboptimal for VLA models due to two challenges: (1) Mainstream environment-oriented edge-cloud partitioning methods are susceptible to interference from visual noise; (2) Existing edge-cloud partitioning methods overlook the step-wise redundancy unique to embodied tasks, thereby disrupting the physical continuity of motion. To address these issues, we propose a novel ECC inference framework, termed RAPID. Specifically, we developed an implementation tailored to the proposed framework. Experiments demonstrate this achieves a speedup of up to 1.73x with only 5%~7% overhead.

View on arXiv PDF

Similar