Paul G. Plöger

h-index14

16papers

416citations

Novelty40%

AI Score38

Ranked #84,912 of 194,257 authors (top 44%)#2,547 in RO (top 38%)

16 Papers

5.0CVJun 4, 2023

Sanity Checks for Saliency Methods Explaining Object Detectors

Deepan Chakravarthi Padmanabhan, Paul G. Plöger, Octavio Arriaga et al.

Saliency methods are frequently used to explain Deep Neural Network-based models. Adebayo et al.'s work on evaluating saliency methods for classification models illustrate certain explanation methods fail the model and data randomization tests. However, on extending the tests for various state of the art object detectors we illustrate that the ability to explain a model is more dependent on the model itself than the explanation method. We perform sanity checks for object detection and define new qualitative criteria to evaluate the saliency explanations, both for object classification and bounding box decisions, using Guided Backpropagation, Integrated Gradients, and their Smoothgrad versions, together with Faster R-CNN, SSD, and EfficientDet-D0, trained on COCO. In addition, the sensitivity of the explanation method to model parameters and data labels varies class-wise motivating to perform the sanity checks for each class. We find that EfficientDet-D0 is the most interpretable method independent of the saliency method, which passes the sanity checks with little problems.

17.5CVOct 20, 2017Code

Real-time Convolutional Neural Networks for Emotion and Gender Classification

Octavio Arriaga, Matias Valdenegro-Toro, Paul Plöger

In this paper we propose an implement a general convolutional neural network (CNN) building framework for designing real-time CNNs. We validate our models by creating a real-time vision system which accomplishes the tasks of face detection, gender classification and emotion classification simultaneously in one blended step using our proposed CNN architecture. After presenting the details of the training procedure setup we proceed to evaluate on standard benchmark sets. We report accuracies of 96% in the IMDB gender dataset and 66% in the FER-2013 emotion dataset. Along with this we also introduced the very recent real-time enabled guided back-propagation visualization technique. Guided back-propagation uncovers the dynamics of the weight changes and evaluates the learned features. We argue that the careful implementation of modern CNN architectures, the use of the current regularization methods and the visualization of previously hidden features are necessary in order to reduce the gap between slow performances and real-time architectures. Our system has been validated by its deployment on a Care-O-bot 3 robot used during RoboCup@Home competitions. All our code, demos and pre-trained architectures have been released under an open-source license in our public repository.

8.3ROFeb 28, 2024Code

A Multimodal Handover Failure Detection Dataset and Baselines

Santosh Thoduka, Nico Hochgeschwender, Juergen Gall et al.

An object handover between a robot and a human is a coordinated action which is prone to failure for reasons such as miscommunication, incorrect actions and unexpected object properties. Existing works on handover failure detection and prevention focus on preventing failures due to object slip or external disturbances. However, there is a lack of datasets and evaluation methods that consider unpreventable failures caused by the human participant. To address this deficit, we present the multimodal Handover Failure Detection dataset, which consists of failures induced by the human participant, such as ignoring the robot or not releasing the object. We also present two baseline methods for handover failure detection: (i) a video classification method using 3D CNNs and (ii) a temporal action segmentation approach which jointly classifies the human action, robot action and overall outcome of the action. The results show that video is an important modality, but using force-torque data and gripper position help improve failure detection and action segmentation accuracy.

2.2ROApr 8, 2024

A Neuromorphic Approach to Obstacle Avoidance in Robot Manipulation

Ahmed Faisal Abdelrahman, Matias Valdenegro-Toro, Maren Bennewitz et al.

Neuromorphic computing mimics computational principles of the brain in $\textit{silico}$ and motivates research into event-based vision and spiking neural networks (SNNs). Event cameras (ECs) exclusively capture local intensity changes and offer superior power consumption, response latencies, and dynamic ranges. SNNs replicate biological neuronal dynamics and have demonstrated potential as alternatives to conventional artificial neural networks (ANNs), such as in reducing energy expenditure and inference time in visual classification. Nevertheless, these novel paradigms remain scarcely explored outside the domain of aerial robots. To investigate the utility of brain-inspired sensing and data processing, we developed a neuromorphic approach to obstacle avoidance on a camera-equipped manipulator. Our approach adapts high-level trajectory plans with reactive maneuvers by processing emulated event data in a convolutional SNN, decoding neural activations into avoidance motions, and adjusting plans using a dynamic motion primitive. We conducted experiments with a Kinova Gen3 arm performing simple reaching tasks that involve obstacles in sets of distinct task scenarios and in comparison to a non-adaptive baseline. Our neuromorphic approach facilitated reliable avoidance of imminent collisions in simulated and real-world experiments, where the baseline consistently failed. Trajectory adaptations had low impacts on safety and predictability criteria. Among the notable SNN properties were the correlation of computations with the magnitude of perceived motions and a robustness to different event emulation methods. Tests with a DAVIS346 EC showed similar performance, validating our experimental event emulation. Our results motivate incorporating SNN learning, utilizing neuromorphic processors, and further exploring the potential of neuromorphic methods.

3.2ROAug 26, 2025

Enhancing Video-Based Robot Failure Detection Using Task Knowledge

Santosh Thoduka, Sebastian Houben, Juergen Gall et al.

Robust robotic task execution hinges on the reliable detection of execution failures in order to trigger safe operation modes, recovery strategies, or task replanning. However, many failure detection methods struggle to provide meaningful performance when applied to a variety of real-world scenarios. In this paper, we propose a video-based failure detection approach that uses spatio-temporal knowledge in the form of the actions the robot performs and task-relevant objects within the field of view. Both pieces of information are available in most robotic scenarios and can thus be readily obtained. We demonstrate the effectiveness of our approach on three datasets that we amend, in part, with additional annotations of the aforementioned task-relevant knowledge. In light of the results, we also propose a data augmentation method that improves performance by applying variable frame rates to different parts of the video. We observe an improvement from 77.9 to 80.0 in F1 score on the ARMBench dataset without additional computational expense and an additional increase to 81.4 with test-time augmentation. The results emphasize the importance of spatio-temporal information during failure detection and suggest further investigation of suitable heuristics in future implementations. Code and annotations are available.

3.0ROAug 19, 2021

Property-Based Testing in Simulation for Verifying Robot Action Execution in Tabletop Manipulation

Salman Omar Sohail, Alex Mitrevski, Nico Hochgeschwender et al.

An important prerequisite for the reliability and robustness of a service robot is ensuring the robot's correct behavior when it performs various tasks of interest. Extensive testing is one established approach for ensuring behavioural correctness; this becomes even more important with the integration of learning-based methods into robot software architectures, as there are often no theoretical guarantees about the performance of such methods in varying scenarios. In this paper, we aim towards evaluating the correctness of robot behaviors in tabletop manipulation through automatic generation of simulated test scenarios in which a robot assesses its performance using property-based testing. In particular, key properties of interest for various robot actions are encoded in an action ontology and are then verified and validated within a simulated environment. We evaluate our framework with a Toyota Human Support Robot (HSR) which is tested in a Gazebo simulation. We show that our framework can correctly and consistently identify various failed actions in a variety of randomised tabletop manipulation scenarios, in addition to providing deeper insights into the type and location of failures for each designed property.

1.4CVAug 2, 2021

Forward-Looking Sonar Patch Matching: Modern CNNs, Ensembling, and Uncertainty

Arka Mallick, Paul Plöger, Matias Valdenegro-Toro

Application of underwater robots are on the rise, most of them are dependent on sonar for underwater vision, but the lack of strong perception capabilities limits them in this task. An important issue in sonar perception is matching image patches, which can enable other techniques like localization, change detection, and mapping. There is a rich literature for this problem in color images, but for acoustic images, it is lacking, due to the physics that produce these images. In this paper we improve on our previous results for this problem (Valdenegro-Toro et al, 2017), instead of modeling features manually, a Convolutional Neural Network (CNN) learns a similarity function and predicts if two input sonar images are similar or not. With the objective of improving the sonar image matching problem further, three state of the art CNN architectures are evaluated on the Marine Debris dataset, namely DenseNet, and VGG, with a siamese or two-channel architecture, and contrastive loss. To ensure a fair evaluation of each network, thorough hyper-parameter optimization is executed. We find that the best performing models are DenseNet Two-Channel network with 0.955 AUC, VGG-Siamese with contrastive loss at 0.949 AUC and DenseNet Siamese with 0.921 AUC. By ensembling the top performing DenseNet two-channel and DenseNet-Siamese models overall highest prediction accuracy obtained is 0.978 AUC, showing a large improvement over the 0.91 AUC in the state of the art.

11.6ROJul 29, 2021Code

Using Visual Anomaly Detection for Task Execution Monitoring

Santosh Thoduka, Juergen Gall, Paul G. Plöger

Execution monitoring is essential for robots to detect and respond to failures. Since it is impossible to enumerate all failures for a given task, we learn from successful executions of the task to detect visual anomalies during runtime. Our method learns to predict the motions that occur during the nominal execution of a task, including camera and robot body motion. A probabilistic U-Net architecture is used to learn to predict optical flow, and the robot's kinematics and 3D model are used to model camera and body motion. The errors between the observed and predicted motion are used to calculate an anomaly score. We evaluate our method on a dataset of a robot placing a book on a shelf, which includes anomalies such as falling books, camera occlusions, and robot disturbances. We find that modeling camera and body motion, in addition to the learning-based optical flow prediction, results in an improvement of the area under the receiver operating characteristic curve from 0.752 to 0.804, and the area under the precision-recall curve from 0.467 to 0.549.

7.3ROJul 20, 2021Code

Ontology-Assisted Generalisation of Robot Action Execution Knowledge

Alex Mitrevski, Paul G. Plöger, Gerhard Lakemeyer

When an autonomous robot learns how to execute actions, it is of interest to know if and when the execution policy can be generalised to variations of the learning scenarios. This can inform the robot about the necessity of additional learning, as using incomplete or unsuitable policies can lead to execution failures. Generalisation is particularly relevant when a robot has to deal with a large variety of objects and in different contexts. In this paper, we propose and analyse a strategy for generalising parameterised execution models of manipulation actions over different objects based on an object ontology. In particular, a robot transfers a known execution model to objects of related classes according to the ontology, but only if there is no other evidence that the model may be unsuitable. This allows using ontological knowledge as prior information that is then refined by the robot's own experiences. We verify our algorithm for two actions - grasping and stowing everyday objects - such that we show that the robot can deduce cases in which an existing policy can generalise to other objects and when additional execution knowledge has to be acquired.

5.3ROMay 20, 2021Code

Robot Action Diagnosis and Experience Correction by Falsifying Parameterised Execution Models

Alex Mitrevski, Paul G. Plöger, Gerhard Lakemeyer

When faced with an execution failure, an intelligent robot should be able to identify the likely reasons for the failure and adapt its execution policy accordingly. This paper addresses the question of how to utilise knowledge about the execution process, expressed in terms of learned constraints, in order to direct the diagnosis and experience acquisition process. In particular, we present two methods for creating a synergy between failure diagnosis and execution model learning. We first propose a method for diagnosing execution failures of parameterised action execution models, which searches for action parameters that violate a learned precondition model. We then develop a strategy that uses the results of the diagnosis process for generating synthetic data that are more likely to lead to successful execution, thereby increasing the set of available experiences to learn from. The diagnosis and experience correction methods are evaluated for the problem of handle grasping, such that we experimentally demonstrate the effectiveness of the diagnosis algorithm and show that corrected failed experiences can contribute towards improving the execution success of a robot.

1.2CVOct 29, 2020

Black-Box Optimization of Object Detector Scales

Mohandass Muthuraja, Octavio Arriaga, Paul Plöger et al.

Object detectors have improved considerably in the last years by using advanced CNN architectures. However, many detector hyper-parameters are generally manually tuned, or they are used with values set by the detector authors. Automatic Hyper-parameter optimization has not been explored in improving CNN-based object detectors hyper-parameters. In this work, we propose the use of Black-box optimization methods to tune the prior/default box scales in Faster R-CNN and SSD, using Bayesian Optimization, SMAC, and CMA-ES. We show that by tuning the input image size and prior box anchor scale on Faster R-CNN mAP increases by 2% on PASCAL VOC 2007, and by 3% with SSD. On the COCO dataset with SSD there are mAP improvement in the medium and large objects, but mAP decreases by 1% in small objects. We also perform a regression analysis to find the significant hyper-parameters to tune.

0.8CLSep 2, 2020Code

Comparative Evaluation of Pretrained Transfer Learning Models on Automatic Short Answer Grading

Sasi Kiran Gaddipati, Deebul Nair, Paul G. Plöger

Automatic Short Answer Grading (ASAG) is the process of grading the student answers by computational approaches given a question and the desired answer. Previous works implemented the methods of concept mapping, facet mapping, and some used the conventional word embeddings for extracting semantic features. They extracted multiple features manually to train on the corresponding datasets. We use pretrained embeddings of the transfer learning models, ELMo, BERT, GPT, and GPT-2 to assess their efficiency on this task. We train with a single feature, cosine similarity, extracted from the embeddings of these models. We compare the RMSE scores and correlation measurements of the four models with previous works on Mohler dataset. Our work demonstrates that ELMo outperformed the other three models. We also, briefly describe the four transfer learning models and conclude with the possible causes of poor results of transfer learning models.

6.5CVJul 3, 2020

Evaluating Uncertainty Estimation Methods on 3D Semantic Segmentation of Point Clouds

Swaroop Bhandary K, Nico Hochgeschwender, Paul Plöger et al.

Deep learning models are extensively used in various safety critical applications. Hence these models along with being accurate need to be highly reliable. One way of achieving this is by quantifying uncertainty. Bayesian methods for UQ have been extensively studied for Deep Learning models applied on images but have been less explored for 3D modalities such as point clouds often used for Robots and Autonomous Systems. In this work, we evaluate three uncertainty quantification methods namely Deep Ensembles, MC-Dropout and MC-DropConnect on the DarkNet21Seg 3D semantic segmentation model and comprehensively analyze the impact of various parameters such as number of models in ensembles or forward passes, and drop probability values, on task performance and uncertainty estimate quality. We find that Deep Ensembles outperforms other methods in both performance and uncertainty metrics. Deep ensembles outperform other methods by a margin of 2.4% in terms of mIOU, 1.3% in terms of accuracy, while providing reliable uncertainty for decision making.

4.1ROMar 23, 2020Code

Performance Evaluation of Low-Cost Machine Vision Cameras for Image-Based Grasp Verification

Deebul Nair, Amirhossein Pakdaman, Paul G. Plöger

Grasp verification is advantageous for autonomous manipulation robots as they provide the feedback required for higher level planning components about successful task completion. However, a major obstacle in doing grasp verification is sensor selection. In this paper, we propose a vision based grasp verification system using machine vision cameras, with the verification problem formulated as an image classification task. Machine vision cameras consist of a camera and a processing unit capable of on-board deep learning inference. The inference in these low-power hardware are done near the data source, reducing the robot's dependence on a centralized server, leading to reduced latency, and improved reliability. Machine vision cameras provide the deep learning inference capabilities using different neural accelerators. Although, it is not clear from the documentation of these cameras what is the effect of these neural accelerators on performance metrics such as latency and throughput. To systematically benchmark these machine vision cameras, we propose a parameterized model generator that generates end to end models of Convolutional Neural Networks(CNN). Using these generated models we benchmark latency and throughput of two machine vision cameras, JeVois A33 and Sipeed Maix Bit. Our experiments demonstrate that the selected machine vision camera and the deep learning models can robustly verify grasp with 97% per frame accuracy.

0.9CVNov 7, 2017

Image Captioning and Classification of Dangerous Situations

Octavio Arriaga, Paul Plöger, Matias Valdenegro-Toro

Current robot platforms are being employed to collaborate with humans in a wide range of domestic and industrial tasks. These environments require autonomous systems that are able to classify and communicate anomalous situations such as fires, injured persons, car accidents; or generally, any potentially dangerous situation for humans. In this paper we introduce an anomaly detection dataset for the purpose of robot applications as well as the design and implementation of a deep learning architecture that classifies and describes dangerous situations using only a single image as input. We report a classification accuracy of 97 % and METEOR score of 16.2. We will make the dataset publicly available after this paper is accepted.

1.1CVJun 3, 2016

On Recognizing Transparent Objects in Domestic Environments Using Fusion of Multiple Sensor Modalities

Alexander Hagg, Frederik Hegger, Paul Plöger

Current object recognition methods fail on object sets that include both diffuse, reflective and transparent materials, although they are very common in domestic scenarios. We show that a combination of cues from multiple sensor modalities, including specular reflectance and unavailable depth information, allows us to capture a larger subset of household objects by extending a state of the art object recognition method. This leads to a significant increase in robustness of recognition over a larger set of commonly used objects.