AsyncVLA: An Asynchronous VLA for Fast and Robust Navigation on the Edge
This work addresses the problem of real-time deployment for robotic navigation on the edge, making powerful models safer and more effective in dynamic settings, though it is incremental as it builds on hierarchical control and finetuning methods.
The paper tackled the high inference latency of robotic foundation models in dynamic environments by proposing AsyncVLA, an asynchronous control framework that decouples semantic reasoning from reactive execution, achieving a 40% higher success rate than state-of-the-art baselines in real-world vision-based navigation tasks with communication delays up to 6 seconds.
Robotic foundation models achieve strong generalization by leveraging internet-scale vision-language representations, but their massive computational cost creates a fundamental bottleneck: high inference latency. In dynamic environments, this latency breaks the control loop, rendering powerful models unsafe for real-time deployment. We propose AsyncVLA, an asynchronous control framework that decouples semantic reasoning from reactive execution. Inspired by hierarchical control, AsyncVLA runs a large foundation model on a remote workstation to provide high-level guidance, while a lightweight, onboard Edge Adapter continuously refines actions at high frequency. To bridge the domain gap between these asynchronous streams, we introduce an end-to-end finetuning protocol and a trajectory re-weighting strategy that prioritizes dynamic interactions. We evaluate our approach on real-world vision-based navigation tasks with communication delays up to 6 seconds. AsyncVLA achieves a 40% higher success rate than state-of-the-art baselines, effectively bridging the gap between the semantic intelligence of large models and the reactivity required for edge robotics.