AISep 27, 2025

Transferring Vision-Language-Action Models to Industry Applications: Architectures, Performance, and Challenges

Shuai Li, Chen Yizhe, Li Dong, Liu Sichao, Lan Dapeng, Liu Yu, Zhibo Pang

arXiv:2509.23121v13 citationsh-index: 32025 7th International Conference on Industrial Artificial Intelligence (IAI)

AI Analysis

This work assesses the adaptability of VLA models for industrial applications, highlighting incremental insights into their deployment challenges.

The paper evaluated state-of-the-art vision-language-action (VLA) models in industrial scenarios, finding they perform well on simple grasping tasks after fine-tuning but show significant room for improvement in complex environments, diverse objects, and high-precision placing tasks.

The application of artificial intelligence (AI) in industry is accelerating the shift from traditional automation to intelligent systems with perception and cognition. Vision language-action (VLA) models have been a key paradigm in AI to unify perception, reasoning, and control. Has the performance of the VLA models met the industrial requirements? In this paper, from the perspective of industrial deployment, we compare the performance of existing state-of-the-art VLA models in industrial scenarios and analyze the limitations of VLA models for real-world industrial deployment from the perspectives of data collection and model architecture. The results show that the VLA models retain their ability to perform simple grasping tasks even in industrial settings after fine-tuning. However, there is much room for performance improvement in complex industrial environments, diverse object categories, and high precision placing tasks. Our findings provide practical insight into the adaptability of VLA models for industrial use and highlight the need for task-specific enhancements to improve their robustness, generalization, and precision.

View on arXiv PDF

Similar