ROMay 26

Can VLA Models Learn from Real-World Data Continually without Forgetting?

arXiv:2605.2682068.8
Predicted impact top 26% in RO · last 90 daysOriginality Incremental advance
AI Analysis

For robotics researchers aiming to deploy VLA models that continuously acquire new skills without forgetting, this work offers the first real-world empirical insights and practical guidance.

This paper studies continual learning for vision-language-action (VLA) models in real-world robotics, finding that they suffer significant catastrophic forgetting when learning heterogeneous tasks sequentially. It provides the first empirical study on this topic and identifies key factors for successful experience replay.

Vision-language-action (VLA) models provide a promising foundation for general-purpose robotics. However, their successful deployment in real-world scenarios requires the ability to continually acquire new skills while retaining previously learned behaviors. While pioneering research has studied the continual learning of VLA models in narrowly simulated environments, this challenge remains largely unexplored under realistic conditions. To address this limitation, we construct a real-world continual learning dataset comprising four sequential manipulation tasks, spanning rigid-object pick-and-place, contact-rich pressing, and deformable-object folding. Using this dataset, we conduct comprehensive experiments and find that VLA models suffer significant catastrophic forgetting when continually learning from heterogeneous real-world demonstrations. We then systematically evaluate experience replay and uncover key implementation factors that govern its success. In summary, this work provides the first empirical study of real-world continual VLA learning and offers practical guidance for deploying long-lived robot policies.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes