Enhancing Video-Based Robot Failure Detection Using Task Knowledge
This work addresses robust failure detection for robotic systems, offering incremental improvements in performance for specific datasets.
The paper tackles the problem of detecting robot execution failures in real-world scenarios by incorporating spatio-temporal task knowledge, such as actions and objects, into a video-based approach, resulting in an F1 score improvement from 77.9 to 80.0 on the ARMBench dataset and up to 81.4 with augmentation.
Robust robotic task execution hinges on the reliable detection of execution failures in order to trigger safe operation modes, recovery strategies, or task replanning. However, many failure detection methods struggle to provide meaningful performance when applied to a variety of real-world scenarios. In this paper, we propose a video-based failure detection approach that uses spatio-temporal knowledge in the form of the actions the robot performs and task-relevant objects within the field of view. Both pieces of information are available in most robotic scenarios and can thus be readily obtained. We demonstrate the effectiveness of our approach on three datasets that we amend, in part, with additional annotations of the aforementioned task-relevant knowledge. In light of the results, we also propose a data augmentation method that improves performance by applying variable frame rates to different parts of the video. We observe an improvement from 77.9 to 80.0 in F1 score on the ARMBench dataset without additional computational expense and an additional increase to 81.4 with test-time augmentation. The results emphasize the importance of spatio-temporal information during failure detection and suggest further investigation of suitable heuristics in future implementations. Code and annotations are available.