CVDec 21, 2022Code
DExT: Detector Explanation ToolkitDeepan Chakravarthi Padmanabhan, Paul G. Plöger, Octavio Arriaga et al.
State-of-the-art object detectors are treated as black boxes due to their highly non-linear internal computations. Even with unprecedented advancements in detector performance, the inability to explain how their outputs are generated limits their use in safety-critical applications. Previous work fails to produce explanations for both bounding box and classification decisions, and generally make individual explanations for various detectors. In this paper, we propose an open-source Detector Explanation Toolkit (DExT) which implements the proposed approach to generate a holistic explanation for all detector decisions using certain gradient-based explanation methods. We suggests various multi-object visualization methods to merge the explanations of multiple objects detected in an image as well as the corresponding detections in a single image. The quantitative evaluation show that the Single Shot MultiBox Detector (SSD) is more faithfully explained compared to other detectors regardless of the explanation methods. Both quantitative and human-centric evaluations identify that SmoothGrad with Guided Backpropagation (GBP) provides more trustworthy explanations among selected methods across all detectors. We expect that DExT will motivate practitioners to evaluate object detectors from the interpretability perspective by explaining both bounding box and classification decisions.
CVJun 4, 2023
Sanity Checks for Saliency Methods Explaining Object DetectorsDeepan Chakravarthi Padmanabhan, Paul G. Plöger, Octavio Arriaga et al.
Saliency methods are frequently used to explain Deep Neural Network-based models. Adebayo et al.'s work on evaluating saliency methods for classification models illustrate certain explanation methods fail the model and data randomization tests. However, on extending the tests for various state of the art object detectors we illustrate that the ability to explain a model is more dependent on the model itself than the explanation method. We perform sanity checks for object detection and define new qualitative criteria to evaluate the saliency explanations, both for object classification and bounding box decisions, using Guided Backpropagation, Integrated Gradients, and their Smoothgrad versions, together with Faster R-CNN, SSD, and EfficientDet-D0, trained on COCO. In addition, the sensitivity of the explanation method to model parameters and data labels varies class-wise motivating to perform the sanity checks for each class. We find that EfficientDet-D0 is the most interpretable method independent of the saliency method, which passes the sanity checks with little problems.
ROFeb 28, 2024
A Multimodal Handover Failure Detection Dataset and BaselinesSantosh Thoduka, Nico Hochgeschwender, Juergen Gall et al.
An object handover between a robot and a human is a coordinated action which is prone to failure for reasons such as miscommunication, incorrect actions and unexpected object properties. Existing works on handover failure detection and prevention focus on preventing failures due to object slip or external disturbances. However, there is a lack of datasets and evaluation methods that consider unpreventable failures caused by the human participant. To address this deficit, we present the multimodal Handover Failure Detection dataset, which consists of failures induced by the human participant, such as ignoring the robot or not releasing the object. We also present two baseline methods for handover failure detection: (i) a video classification method using 3D CNNs and (ii) a temporal action segmentation approach which jointly classifies the human action, robot action and overall outcome of the action. The results show that video is an important modality, but using force-torque data and gripper position help improve failure detection and action segmentation accuracy.
ROApr 8, 2024
A Neuromorphic Approach to Obstacle Avoidance in Robot ManipulationAhmed Faisal Abdelrahman, Matias Valdenegro-Toro, Maren Bennewitz et al.
Neuromorphic computing mimics computational principles of the brain in $\textit{silico}$ and motivates research into event-based vision and spiking neural networks (SNNs). Event cameras (ECs) exclusively capture local intensity changes and offer superior power consumption, response latencies, and dynamic ranges. SNNs replicate biological neuronal dynamics and have demonstrated potential as alternatives to conventional artificial neural networks (ANNs), such as in reducing energy expenditure and inference time in visual classification. Nevertheless, these novel paradigms remain scarcely explored outside the domain of aerial robots. To investigate the utility of brain-inspired sensing and data processing, we developed a neuromorphic approach to obstacle avoidance on a camera-equipped manipulator. Our approach adapts high-level trajectory plans with reactive maneuvers by processing emulated event data in a convolutional SNN, decoding neural activations into avoidance motions, and adjusting plans using a dynamic motion primitive. We conducted experiments with a Kinova Gen3 arm performing simple reaching tasks that involve obstacles in sets of distinct task scenarios and in comparison to a non-adaptive baseline. Our neuromorphic approach facilitated reliable avoidance of imminent collisions in simulated and real-world experiments, where the baseline consistently failed. Trajectory adaptations had low impacts on safety and predictability criteria. Among the notable SNN properties were the correlation of computations with the magnitude of perceived motions and a robustness to different event emulation methods. Tests with a DAVIS346 EC showed similar performance, validating our experimental event emulation. Our results motivate incorporating SNN learning, utilizing neuromorphic processors, and further exploring the potential of neuromorphic methods.
ROAug 26, 2025
Enhancing Video-Based Robot Failure Detection Using Task KnowledgeSantosh Thoduka, Sebastian Houben, Juergen Gall et al.
Robust robotic task execution hinges on the reliable detection of execution failures in order to trigger safe operation modes, recovery strategies, or task replanning. However, many failure detection methods struggle to provide meaningful performance when applied to a variety of real-world scenarios. In this paper, we propose a video-based failure detection approach that uses spatio-temporal knowledge in the form of the actions the robot performs and task-relevant objects within the field of view. Both pieces of information are available in most robotic scenarios and can thus be readily obtained. We demonstrate the effectiveness of our approach on three datasets that we amend, in part, with additional annotations of the aforementioned task-relevant knowledge. In light of the results, we also propose a data augmentation method that improves performance by applying variable frame rates to different parts of the video. We observe an improvement from 77.9 to 80.0 in F1 score on the ARMBench dataset without additional computational expense and an additional increase to 81.4 with test-time augmentation. The results emphasize the importance of spatio-temporal information during failure detection and suggest further investigation of suitable heuristics in future implementations. Code and annotations are available.
ROAug 19, 2021
Property-Based Testing in Simulation for Verifying Robot Action Execution in Tabletop ManipulationSalman Omar Sohail, Alex Mitrevski, Nico Hochgeschwender et al.
An important prerequisite for the reliability and robustness of a service robot is ensuring the robot's correct behavior when it performs various tasks of interest. Extensive testing is one established approach for ensuring behavioural correctness; this becomes even more important with the integration of learning-based methods into robot software architectures, as there are often no theoretical guarantees about the performance of such methods in varying scenarios. In this paper, we aim towards evaluating the correctness of robot behaviors in tabletop manipulation through automatic generation of simulated test scenarios in which a robot assesses its performance using property-based testing. In particular, key properties of interest for various robot actions are encoded in an action ontology and are then verified and validated within a simulated environment. We evaluate our framework with a Toyota Human Support Robot (HSR) which is tested in a Gazebo simulation. We show that our framework can correctly and consistently identify various failed actions in a variety of randomised tabletop manipulation scenarios, in addition to providing deeper insights into the type and location of failures for each designed property.
ROJul 29, 2021
Using Visual Anomaly Detection for Task Execution MonitoringSantosh Thoduka, Juergen Gall, Paul G. Plöger
Execution monitoring is essential for robots to detect and respond to failures. Since it is impossible to enumerate all failures for a given task, we learn from successful executions of the task to detect visual anomalies during runtime. Our method learns to predict the motions that occur during the nominal execution of a task, including camera and robot body motion. A probabilistic U-Net architecture is used to learn to predict optical flow, and the robot's kinematics and 3D model are used to model camera and body motion. The errors between the observed and predicted motion are used to calculate an anomaly score. We evaluate our method on a dataset of a robot placing a book on a shelf, which includes anomalies such as falling books, camera occlusions, and robot disturbances. We find that modeling camera and body motion, in addition to the learning-based optical flow prediction, results in an improvement of the area under the receiver operating characteristic curve from 0.752 to 0.804, and the area under the precision-recall curve from 0.467 to 0.549.
ROJul 20, 2021
Ontology-Assisted Generalisation of Robot Action Execution KnowledgeAlex Mitrevski, Paul G. Plöger, Gerhard Lakemeyer
When an autonomous robot learns how to execute actions, it is of interest to know if and when the execution policy can be generalised to variations of the learning scenarios. This can inform the robot about the necessity of additional learning, as using incomplete or unsuitable policies can lead to execution failures. Generalisation is particularly relevant when a robot has to deal with a large variety of objects and in different contexts. In this paper, we propose and analyse a strategy for generalising parameterised execution models of manipulation actions over different objects based on an object ontology. In particular, a robot transfers a known execution model to objects of related classes according to the ontology, but only if there is no other evidence that the model may be unsuitable. This allows using ontological knowledge as prior information that is then refined by the robot's own experiences. We verify our algorithm for two actions - grasping and stowing everyday objects - such that we show that the robot can deduce cases in which an existing policy can generalise to other objects and when additional execution knowledge has to be acquired.
ROMay 20, 2021
Robot Action Diagnosis and Experience Correction by Falsifying Parameterised Execution ModelsAlex Mitrevski, Paul G. Plöger, Gerhard Lakemeyer
When faced with an execution failure, an intelligent robot should be able to identify the likely reasons for the failure and adapt its execution policy accordingly. This paper addresses the question of how to utilise knowledge about the execution process, expressed in terms of learned constraints, in order to direct the diagnosis and experience acquisition process. In particular, we present two methods for creating a synergy between failure diagnosis and execution model learning. We first propose a method for diagnosing execution failures of parameterised action execution models, which searches for action parameters that violate a learned precondition model. We then develop a strategy that uses the results of the diagnosis process for generating synthetic data that are more likely to lead to successful execution, thereby increasing the set of available experiences to learn from. The diagnosis and experience correction methods are evaluated for the problem of handle grasping, such that we experimentally demonstrate the effectiveness of the diagnosis algorithm and show that corrected failed experiences can contribute towards improving the execution success of a robot.
CLSep 2, 2020
Comparative Evaluation of Pretrained Transfer Learning Models on Automatic Short Answer GradingSasi Kiran Gaddipati, Deebul Nair, Paul G. Plöger
Automatic Short Answer Grading (ASAG) is the process of grading the student answers by computational approaches given a question and the desired answer. Previous works implemented the methods of concept mapping, facet mapping, and some used the conventional word embeddings for extracting semantic features. They extracted multiple features manually to train on the corresponding datasets. We use pretrained embeddings of the transfer learning models, ELMo, BERT, GPT, and GPT-2 to assess their efficiency on this task. We train with a single feature, cosine similarity, extracted from the embeddings of these models. We compare the RMSE scores and correlation measurements of the four models with previous works on Mohler dataset. Our work demonstrates that ELMo outperformed the other three models. We also, briefly describe the four transfer learning models and conclude with the possible causes of poor results of transfer learning models.
ROMar 23, 2020
Performance Evaluation of Low-Cost Machine Vision Cameras for Image-Based Grasp VerificationDeebul Nair, Amirhossein Pakdaman, Paul G. Plöger
Grasp verification is advantageous for autonomous manipulation robots as they provide the feedback required for higher level planning components about successful task completion. However, a major obstacle in doing grasp verification is sensor selection. In this paper, we propose a vision based grasp verification system using machine vision cameras, with the verification problem formulated as an image classification task. Machine vision cameras consist of a camera and a processing unit capable of on-board deep learning inference. The inference in these low-power hardware are done near the data source, reducing the robot's dependence on a centralized server, leading to reduced latency, and improved reliability. Machine vision cameras provide the deep learning inference capabilities using different neural accelerators. Although, it is not clear from the documentation of these cameras what is the effect of these neural accelerators on performance metrics such as latency and throughput. To systematically benchmark these machine vision cameras, we propose a parameterized model generator that generates end to end models of Convolutional Neural Networks(CNN). Using these generated models we benchmark latency and throughput of two machine vision cameras, JeVois A33 and Sipeed Maix Bit. Our experiments demonstrate that the selected machine vision camera and the deep learning models can robustly verify grasp with 97% per frame accuracy.