LGROMar 9

DyQ-VLA: Temporal-Dynamic-Aware Quantization for Embodied Vision-Language-Action Models

arXiv:2603.07904v14 citations
Predicted impact top 7% in LG · last 90 daysOriginality Highly original
AI Analysis

This work provides a method to reduce the computational cost of embodied VLA models, which is significant for deploying these models on edge devices with limited resources.

This paper addresses the inference overheads of Vision-Language-Action (VLA) models by proposing DyQ-VLA, a dynamic quantization framework. It reduces the memory footprint to 30.9% of the original while maintaining 99.5% performance, leading to 1.49x simulation and up to 1.43x real-world speedups.

Vision-Language-Action (VLA) models are dominant in embodied intelligence but are constrained by inference overheads. While model quantization alleviates these bottlenecks for edge deployment, static quantization approaches remain suboptimal for VLAs due to two critical challenges: (1) Temporal-dynamic sensitivity, where fixed precision wastes resources by ignoring stage-varying error tolerances; and (2) Real-time allocation, where identifying real-time sensitivity to guide bit allocation remains unsolved. To address these challenges, we propose DyQ-VLA, a dynamic quantization framework for VLAs. Specifically, a sensitivity-aware switching strategy leverages real-time kinematic proxies to trigger the bit-width switch, while a kinematic-guided module dynamically allocates the optimal bit-width. Experiments show that DyQ-VLA requires only 30.9% of the original memory footprint while maintaining 99.5% of its original performance, achieving 1.49x simulation and up to 1.43x real-world speedups.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes