ROApr 14

iTeach: In the Wild Interactive Teaching for Failure-Driven Adaptation of Robot Perception

Jishnu Jaykumar P, Cole Salvato, Vinaya Bomnale, Jikai Wang, Yu Xiang

arXiv:2410.0907262.01 citationsh-index: 4

AI Analysis

For roboticists deploying perception models in unstructured environments, iTeach provides a practical method to adapt models at deployment time with minimal human effort, addressing the bottleneck of out-of-distribution failures.

iTeach introduces a failure-driven interactive teaching framework that adapts robot perception models in real-world deployment by having humans identify failures, perform short interactions, and annotate only the final frame using eye-gaze and voice commands. The method improves unseen object instance segmentation by 5-10% mAP on average across diverse scenes, leading to higher grasping success rates (e.g., 15% improvement on SceneReplica).

Robotic perception models often fail when deployed in real-world environments due to out-of-distribution conditions such as clutter, occlusion, and novel object instances. Existing approaches address this gap through offline data collection and retraining, which are slow and do not resolve deployment-time failures. We propose iTeach, a failure-driven interactive teaching framework for adapting robot perception in the wild. A co-located human observes model predictions during deployment, identifies failure cases, and performs short human-object interaction (HumanPlay) to expose informative object configurations while recording RGB-D sequences. To minimize annotation effort, iTeach employs a Few-Shot Semi- Supervised (FS3) labeling strategy, where only the final frame of a short interaction sequence is annotated using hands-free eye-gaze and voice commands, and labels are propagated across the video to produce dense supervision. The collected failure-driven samples are used for iterative fine-tuning, enabling progressive deployment-time adaptation of the perception model. We evaluate iTeach on unseen object instance segmentation (UOIS) starting from a pretrained MSMFormer model. Using a small number of failure-driven samples, our method significantly improves segmentation performance across diverse real-world scenes. These improvements directly translate to higher grasping and pick-and-place success on the SceneReplica benchmark and real robotic experiments. Our results demonstrate that failure-driven, co-located interactive teaching enables efficient in-the-wild adaptation of robot perception and improves downstream manipulation performance. Project page at https://irvlutd.github.io/iTeach

View on arXiv PDF

Similar