RoboECC: Multi-Factor-Aware Edge-Cloud Collaborative Deployment for VLA Models
This addresses real-time performance issues for VLA models in embodied intelligence, though it is incremental as it builds on existing edge-cloud collaborative frameworks.
The paper tackles the high inference costs of Vision-Language-Action (VLA) models in edge-cloud collaborative deployment by proposing RoboECC, which uses a model-hardware co-aware segmentation strategy and network-aware adjustment to achieve a speedup of up to 3.28x with only 2.55x~2.62x overhead.
Vision-Language-Action (VLA) models are mainstream in embodied intelligence but face high inference costs. Edge-Cloud Collaborative (ECC) deployment offers an effective fix by easing edge-device computing pressure to meet real-time needs. However, existing ECC frameworks are suboptimal for VLA models due to two challenges: (1) Diverse model structures hinder optimal ECC segmentation point identification; (2) Even if the optimal split point is determined, changes in network bandwidth can cause performance drift. To address these issues, we propose a novel ECC deployment framework for various VLA models, termed RoboECC. Specifically, we propose a model-hardware co-aware segmentation strategy to help find the optimal segmentation point for various VLA models. Moreover, we propose a network-aware deployment adjustment approach to adapt to the network fluctuations for maintaining optimal performance. Experiments demonstrate that RoboECC achieves a speedup of up to 3.28x with only 2.55x~2.62x overhead.