ROCVAug 26, 2025

Enhancing Video-Based Robot Failure Detection Using Task Knowledge

arXiv:2508.18705v21 citationsh-index: 17EMCR
Originality Incremental advance
AI Analysis

This work addresses robust failure detection for robotic systems, offering incremental improvements in performance for specific datasets.

The paper tackles the problem of detecting robot execution failures in real-world scenarios by incorporating spatio-temporal task knowledge, such as actions and objects, into a video-based approach, resulting in an F1 score improvement from 77.9 to 80.0 on the ARMBench dataset and up to 81.4 with augmentation.

Robust robotic task execution hinges on the reliable detection of execution failures in order to trigger safe operation modes, recovery strategies, or task replanning. However, many failure detection methods struggle to provide meaningful performance when applied to a variety of real-world scenarios. In this paper, we propose a video-based failure detection approach that uses spatio-temporal knowledge in the form of the actions the robot performs and task-relevant objects within the field of view. Both pieces of information are available in most robotic scenarios and can thus be readily obtained. We demonstrate the effectiveness of our approach on three datasets that we amend, in part, with additional annotations of the aforementioned task-relevant knowledge. In light of the results, we also propose a data augmentation method that improves performance by applying variable frame rates to different parts of the video. We observe an improvement from 77.9 to 80.0 in F1 score on the ARMBench dataset without additional computational expense and an additional increase to 81.4 with test-time augmentation. The results emphasize the importance of spatio-temporal information during failure detection and suggest further investigation of suitable heuristics in future implementations. Code and annotations are available.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes