Multi-Task Interactive Robot Fleet Learning with Visual World Models
This addresses the challenge of deploying AI-enabled robots in real-world environments with variability and uncertainty, representing an incremental advancement in multi-task robot learning.
The paper tackles the problem of generalization and robustness in multi-task robot fleets for household and industrial settings by introducing Sirius-Fleet, a framework that uses visual world models and anomaly predictors to monitor performance and involve humans for corrections, resulting in improved policy performance and reduced human workload over time.
Recent advancements in large-scale multi-task robot learning offer the potential for deploying robot fleets in household and industrial settings, enabling them to perform diverse tasks across various environments. However, AI-enabled robots often face challenges with generalization and robustness when exposed to real-world variability and uncertainty. We introduce Sirius-Fleet, a multi-task interactive robot fleet learning framework to address these challenges. Sirius-Fleet monitors robot performance during deployment and involves humans to correct the robot's actions when necessary. We employ a visual world model to predict the outcomes of future actions and build anomaly predictors to predict whether they will likely result in anomalies. As the robot autonomy improves, the anomaly predictors automatically adapt their prediction criteria, leading to fewer requests for human intervention and gradually reducing human workload over time. Evaluations on large-scale benchmarks demonstrate Sirius-Fleet's effectiveness in improving multi-task policy performance and monitoring accuracy. We demonstrate Sirius-Fleet's performance in both RoboCasa in simulation and Mutex in the real world, two diverse, large-scale multi-task benchmarks. More information is available on the project website: https://ut-austin-rpl.github.io/sirius-fleet