Rustam Stolkin

RO
h-index19
35papers
785citations
Novelty44%
AI Score51

35 Papers

RONov 25, 2022
A Hierarchical Variable Autonomy Mixed-Initiative Framework for Human-Robot Teaming in Mobile Robotics

Dimitris Panagopoulos, Giannis Petousakis, Aniketh Ramesh et al.

This paper presents a Mixed-Initiative (MI) framework for addressing the problem of control authority transfer between a remote human operator and an AI agent when cooperatively controlling a mobile robot. Our Hierarchical Expert-guided Mixed-Initiative Control Switcher (HierEMICS) leverages information on the human operator's state and intent. The control switching policies are based on a criticality hierarchy. An experimental evaluation was conducted in a high-fidelity simulated disaster response and remote inspection scenario, comparing HierEMICS with a state-of-the-art Expert-guided Mixed-Initiative Control Switcher (EMICS) in the context of mobile robot navigation. Results suggest that HierEMICS reduces conflicts for control between the human and the AI agent, which is a fundamental challenge in both the MI control paradigm and also in the related shared control paradigm. Additionally, we provide statistically significant evidence of improved, navigational safety (i.e., fewer collisions), LOA switching efficiency, and conflict for control reduction.

ROJul 4, 2022
Robot Vitals and Robot Health: Towards Systematically Quantifying Runtime Performance Degradation in Robots Under Adverse Conditions

Aniketh Ramesh, Rustam Stolkin, Manolis Chiou

This paper addresses the problem of automatically detecting and quantifying performance degradation in remote mobile robots during task execution. A robot may encounter a variety of uncertainties and adversities during task execution, which can impair its ability to carry out tasks effectively and cause its performance to degrade. Such situations can be mitigated or averted by timely detection and intervention (e.g., by a remote human supervisor taking over control in teleoperation mode). Inspired by patient triaging systems in hospitals, we introduce the framework of "robot vitals" for estimating overall "robot health". A robot's vitals are a set of indicators that estimate the extent of performance degradation faced by a robot at a given point in time. Robot health is a metric that combines robot vitals into a single scalar value estimate of performance degradation. Experiments, both in simulation and on a real mobile robot, demonstrate that the proposed robot vitals and robot health can be used effectively to estimate robot performance degradation during runtime.

ROApr 20
Hybrid Task and Motion Planning with Reactive Collision Handling for Multi-Robot Disassembly of Complex Products: Application to EV Batteries

Abdelaziz Shaarawy, Cansu Erdogan, Rustam Stolkin et al.

This paper addresses the problem of multi-robot coordination for complex manipulation task sequences. We present a vision-driven task-and-motion planning (TAMP) framework for a real dual-agent platform that integrates task decomposition and allocation with a learning-based RRT planner. A GMM-informed motion planner is coupled with a hybrid safety layer that combines predictive collision checking in a MoveIt/FCL digital twin with reactive vision-based avoidance and replanning. This integration is challenging as the system jointly satisfies task precedence, geometric feasibility, dynamic obstacle avoidance, and dual-arm coordination constraints. The framework operates in closed loop by updating the remaining task sequence from repeated scene scans and completion-state tracking rather than executing a fixed open-loop plan. In EV battery disassembly experiments, compared with Default-RRTConnect under identical perception and task assignments, the proposed system reduces cumulative end-effector path length from 48.8 to 17.9~m ($-63.3\%$), improves makespan from 467.9 to 429.8~s ($-8.1\%$), and reduces swept volumes (R1: $0.583\rightarrow0.139\,\mathrm{m}^3$, R2: $0.696\rightarrow0.252\,\mathrm{m}^3$) and overlap ($0.064\rightarrow0.034\,\mathrm{m}^3$). These results show that combining predictive planning and reactive collision avoidance in a real dual-arm disassembly scenario improves motion compactness, safety, and scalability to broader multi-robot sequential manipulation tasks.

CVAug 1, 2018Code
Weather Classification: A new multi-class dataset, data augmentation approach and comprehensive evaluations of Convolutional Neural Networks

Jose Carlos Villarreal Guerra, Zeba Khanam, Shoaib Ehsan et al.

Weather conditions often disrupt the proper functioning of transportation systems. Present systems either deploy an array of sensors or use an in-vehicle camera to predict weather conditions. These solutions have resulted in incremental cost and limited scope. To ensure smooth operation of all transportation services in all-weather conditions, a reliable detection system is necessary to classify weather in wild. The challenges involved in solving this problem is that weather conditions are diverse in nature and there is an absence of discriminate features among various weather conditions. The existing works to solve this problem have been scene specific and have targeted classification of two categories of weather. In this paper, we have created a new open source dataset consisting of images depicting three classes of weather i.e rain, snow and fog called RFS Dataset. A novel algorithm has also been proposed which has used super pixel delimiting masks as a form of data augmentation, leading to reasonable results with respect to ten Convolutional Neural Network architectures.

ROJul 14, 2025
Probabilistic Human Intent Prediction for Mobile Manipulation: An Evaluation with Human-Inspired Constraints

Cesar Alan Contreras, Manolis Chiou, Alireza Rastegarpanah et al.

Accurate inference of human intent enables human-robot collaboration without constraining human control or causing conflicts between humans and robots. We present GUIDER (Global User Intent Dual-phase Estimation for Robots), a probabilistic framework that enables a robot to estimate the intent of human operators. GUIDER maintains two coupled belief layers, one tracking navigation goals and the other manipulation goals. In the Navigation phase, a Synergy Map blends controller velocity with an occupancy grid to rank interaction areas. Upon arrival at a goal, an autonomous multi-view scan builds a local 3D cloud. The Manipulation phase combines U2Net saliency, FastSAM instance saliency, and three geometric grasp-feasibility tests, with an end-effector kinematics-aware update rule that evolves object probabilities in real-time. GUIDER can recognize areas and objects of intent without predefined goals. We evaluated GUIDER on 25 trials (five participants x five task variants) in Isaac Sim, and compared it with two baselines, one for navigation and one for manipulation. Across the 25 trials, GUIDER achieved a median stability of 93-100% during navigation, compared with 60-100% for the BOIR baseline, with an improvement of 39.5% in a redirection scenario (T5). During manipulation, stability reached 94-100% (versus 69-100% for Trajectron), with a 31.4% difference in a redirection task (T3). In geometry-constrained trials (manipulation), GUIDER recognized the object intent three times earlier than Trajectron (median remaining time to confident prediction 23.6 s vs 7.8 s). These results validate our dual-phase framework and show improvements in intent inference in both phases of mobile manipulation tasks.

ROOct 20, 2025
Intent-Driven LLM Ensemble Planning for Flexible Multi-Robot Disassembly: Demonstration on EV Batteries

Cansu Erdogan, Cesar Alan Contreras, Alireza Rastegarpanah et al.

This paper addresses the problem of planning complex manipulation tasks, in which multiple robots with different end-effectors and capabilities, informed by computer vision, must plan and execute concatenated sequences of actions on a variety of objects that can appear in arbitrary positions and configurations in unstructured scenes. We propose an intent-driven planning pipeline which can robustly construct such action sequences with varying degrees of supervisory input from a human using simple language instructions. The pipeline integrates: (i) perception-to-text scene encoding, (ii) an ensemble of large language models (LLMs) that generate candidate removal sequences based on the operator's intent, (iii) an LLM-based verifier that enforces formatting and precedence constraints, and (iv) a deterministic consistency filter that rejects hallucinated objects. The pipeline is evaluated on an example task in which two robot arms work collaboratively to dismantle an Electric Vehicle battery for recycling applications. A variety of components must be grasped and removed in specific sequences, determined by human instructions and/or by task-order feasibility decisions made by the autonomous system. On 200 real scenes with 600 operator prompts across five component classes, we used metrics of full-sequence correctness and next-task correctness to evaluate and compare five LLM-based planners (including ablation analyses of pipeline components). We also evaluated the LLM-based human interface in terms of time to execution and NASA TLX with human participant experiments. Results indicate that our ensemble-with-verification approach reliably maps operator intent to safe, executable multi-robot plans while maintaining low user effort.

ROAug 14, 2025
Utilizing Vision-Language Models as Action Models for Intent Recognition and Assistance

Cesar Alan Contreras, Manolis Chiou, Alireza Rastegarpanah et al.

Human-robot collaboration requires robots to quickly infer user intent, provide transparent reasoning, and assist users in achieving their goals. Our recent work introduced GUIDER, our framework for inferring navigation and manipulation intents. We propose augmenting GUIDER with a vision-language model (VLM) and a text-only language model (LLM) to form a semantic prior that filters objects and locations based on the mission prompt. A vision pipeline (YOLO for object detection and the Segment Anything Model for instance segmentation) feeds candidate object crops into the VLM, which scores their relevance given an operator prompt; in addition, the list of detected object labels is ranked by a text-only LLM. These scores weight the existing navigation and manipulation layers of GUIDER, selecting context-relevant targets while suppressing unrelated objects. Once the combined belief exceeds a threshold, autonomy changes occur, enabling the robot to navigate to the desired area and retrieve the desired object, while adapting to any changes in the operator's intent. Future work will evaluate the system on Isaac Sim using a Franka Emika arm on a Ridgeback base, with a focus on real-time assistance.

ROMar 19, 2025
Geometrically-Aware One-Shot Skill Transfer of Category-Level Objects

Cristiana de Farias, Luis Figueredo, Riddhiman Laha et al.

Robotic manipulation of unfamiliar objects in new environments is challenging and requires extensive training or laborious pre-programming. We propose a new skill transfer framework, which enables a robot to transfer complex object manipulation skills and constraints from a single human demonstration. Our approach addresses the challenge of skill acquisition and task execution by deriving geometric representations from demonstrations focusing on object-centric interactions. By leveraging the Functional Maps (FM) framework, we efficiently map interaction functions between objects and their environments, allowing the robot to replicate task operations across objects of similar topologies or categories, even when they have significantly different shapes. Additionally, our method incorporates a Task-Space Imitation Algorithm (TSIA) which generates smooth, geometrically-aware robot paths to ensure the transferred skills adhere to the demonstrated task constraints. We validate the effectiveness and adaptability of our approach through extensive experiments, demonstrating successful skill transfer and task execution in diverse real-world environments without requiring additional training.

CVNov 5, 2024
Self-supervised cross-modality learning for uncertainty-aware object detection and recognition in applications which lack pre-labelled training data

Irum Mehboob, Li Sun, Alireza Astegarpanah et al.

This paper shows how an uncertainty-aware, deep neural network can be trained to detect, recognise and localise objects in 2D RGB images, in applications lacking annotated train-ng datasets. We propose a self-supervising teacher-student pipeline, in which a relatively simple teacher classifier, trained with only a few labelled 2D thumbnails, automatically processes a larger body of unlabelled RGB-D data to teach a student network based on a modified YOLOv3 architecture. Firstly, 3D object detection with back projection is used to automatically extract and teach 2D detection and localisation information to the student network. Secondly, a weakly supervised 2D thumbnail classifier, with minimal training on a small number of hand-labelled images, is used to teach object category recognition. Thirdly, we use a Gaussian Process GP to encode and teach a robust uncertainty estimation functionality, so that the student can output confidence scores with each categorization. The resulting student significantly outperforms the same YOLO architecture trained directly on the same amount of labelled data. Our GP-based approach yields robust and meaningful uncertainty estimations for complex industrial object classifications. The end-to-end network is also capable of real-time processing, needed for robotics applications. Our method can be applied to many important industrial tasks, where labelled datasets are typically unavailable. In this paper, we demonstrate an example of detection, localisation, and object category recognition of nuclear mixed-waste materials in highly cluttered and unstructured scenes. This is critical for robotic sorting and handling of legacy nuclear waste, which poses complex environmental remediation challenges in many nuclearised nations.

CVMay 10, 2023
Local Region-to-Region Mapping-based Approach to Classify Articulated Objects

Ayush Aggarwal, Rustam Stolkin, Naresh Marturi

Autonomous robots operating in real-world environments encounter a variety of objects that can be both rigid and articulated in nature. Having knowledge of these specific object properties not only helps in designing appropriate manipulation strategies but also aids in developing reliable tracking and pose estimation techniques for many robotic and vision applications. In this context, this paper presents a registration-based local region-to-region mapping approach to classify an object as either articulated or rigid. Using the point clouds of the intended object, the proposed method performs classification by estimating unique local transformations between point clouds over the observed sequence of movements of the object. The significant advantage of the proposed method is that it is a constraint-free approach that can classify any articulated object and is not limited to a specific type of articulation. Additionally, it is a model-free approach with no learning components, which means it can classify whether an object is articulated without requiring any object models or labelled data. We analyze the performance of the proposed method on two publicly available benchmark datasets with a combination of articulated and rigid objects. It is observed that the proposed method can classify articulated and rigid objects with good accuracy.

ROOct 5, 2021
Fessonia: a Method for Real-Time Estimation of Human Operator Workload Using Behavioural Entropy

Paraskevas Chatzithanos, Grigoris Nikolaou, Rustam Stolkin et al.

This paper addresses the problem of the human operator cognitive workload estimation while controlling a robot. Being capable of assessing, in real-time, the operator's workload could help prevent calamitous events from occurring. This workload estimation could enable an AI to make informed decisions to assist or advise the operator, in an advanced human-robot interaction framework. We propose a method, named Fessonia, for real-time cognitive workload estimation from multiple parameters of an operator's driving behaviour via the use of behavioural entropy. Fessonia is comprised of: a method to calculate the entropy (i.e. unpredictability) of the operator driving behaviour profile; the Driver Profile Update algorithm which adapts the entropy calculations to the evolving driving profile of individual operators; and a Warning And Indication System that uses workload estimations to issue advice to the operator. Fessonia is evaluated in a robot teleoperation scenario that incorporated cognitively demanding secondary tasks to induce varying degrees of workload. The results demonstrate the ability of Fessonia to estimate different levels of imposed workload. Additionally, it is demonstrated that our approach is able to detect and adapt to the evolving driving profile of the different operators. Lastly, based on data obtained, a decrease in entropy is observed when a warning indication is issued, suggesting a more attentive approach focused on the primary navigation task.

ROSep 24, 2021
A Bayesian-Based Approach to Human Operator Intent Recognition in Remote Mobile Robot Navigation

Dimitris Panagopoulos, Giannis Petousakis, Rustam Stolkin et al.

This paper addresses the problem of human operator intent recognition during teleoperated robot navigation. In this context, recognition of the operator's intended navigational goal, could enable an artificial intelligence (AI) agent to assist the operator in an advanced human-robot interaction framework. We propose a Bayesian Operator Intent Recognition (BOIR) probabilistic method that utilizes: (i) an observation model that fuses information as a weighting combination of multiple observation sources providing geometric information; (ii) a transition model that indicates the evolution of the state; and (iii) an action model, the Active Intent Recognition Model (AIRM), that enables the operator to communicate their explicit intent asynchronously. The proposed method is evaluated in an experiment where operators controlling a remote mobile robot are tasked with navigation and exploration under various scenarios with different map and obstacle layouts. Results demonstrate that BOIR outperforms two related methods from literature in terms of accuracy and uncertainty of the intent recognition.

ROAug 26, 2021
Human operator cognitive availability aware Mixed-Initiative control

Giannis Petousakis, Manolis Chiou, Grigoris Nikolaou et al.

This paper presents a Cognitive Availability Aware Mixed-Initiative Controller for remotely operated mobile robots. The controller enables dynamic switching between different levels of autonomy (LOA), initiated by either the AI or the human operator. The controller leverages a state-of-the-art computer vision method and an off-the-shelf web camera to infer the cognitive availability of the operator and inform the AI-initiated LOA switching. This constitutes a qualitative advancement over previous Mixed-Initiative (MI) controllers. The controller is evaluated in a disaster response experiment, in which human operators have to conduct an exploration task with a remote robot. MI systems are shown to effectively assist the operators, as demonstrated by quantitative and qualitative results in performance and workload. Additionally, some insights into the experimental difficulties of evaluating complex MI controllers are presented.

ROJul 26, 2021
SpectGRASP: Robotic Grasping by Spectral Correlation

Maxime Adjigble, Cristiana de Farias, Rustam Stolkin et al.

This paper presents a spectral correlation-based method (SpectGRASP) for robotic grasping of arbitrarily shaped, unknown objects. Given a point cloud of an object, SpectGRASP extracts contact points on the object's surface matching the hand configuration. It neither requires offline training nor a-priori object models. We propose a novel Binary Extended Gaussian Image (BEGI), which represents the point cloud surface normals of both object and robot fingers as signals on a 2-sphere. Spherical harmonics are then used to estimate the correlation between fingers and object BEGIs. The resulting spectral correlation density function provides a similarity measure of gripper and object surface normals. This is highly efficient in that it is simultaneously evaluated at all possible finger rotations in SO(3). A set of contact points are then extracted for each finger using rotations with high correlation values. We then use our previous work, Local Contact Moment (LoCoMo) similarity metric, to sequentially rank the generated grasps such that the one with maximum likelihood is executed. We evaluate the performance of SpectGRASP by conducting experiments with a 7-axis robot fitted with a parallel-jaw gripper, in a physics simulation environment. Obtained results indicate that the method not only can grasp individual objects, but also can successfully clear randomly organized groups of objects. The SpectGRASP method also outperforms the closest state-of-the-art method in terms of grasp generation time and grasp-efficiency.

ROJul 17, 2021
Dual Quaternion-Based Visual Servoing for Grasping Moving Objects

Cristiana de Farias, Maxime Adjigble, Brahim Tamadazte et al.

This paper presents a new dual quaternion-based formulation for pose-based visual servoing. Extending our previous work on local contact moment (LoCoMo) based grasp planning, we demonstrate grasping of arbitrarily moving objects in 3D space. Instead of using the conventional axis-angle parameterization, dual quaternions allow designing the visual servoing task in a more compact manner and provide robustness to manipulator singularities. Given an object point cloud, LoCoMo generates a ranked list of grasp and pre-grasp poses, which are used as desired poses for visual servoing. Whenever the object moves (tracked by visual marker tracking), the desired pose updates automatically. For this, capitalising on the dual quaternion spatial distance error, we propose a dynamic grasp re-ranking metric to select the best feasible grasp for the moving object. This allows the robot to readily track and grasp arbitrarily moving objects. In addition, we also explore the robot null-space with our controller to avoid joint limits so as to achieve smooth trajectories while following moving objects. We evaluate the performance of the proposed visual servoing by conducting simulation experiments of grasping various objects using a 7-axis robot fitted with a 2-finger gripper. Obtained results demonstrate the efficiency of our proposed visual servoing.

ROJul 1, 2021
Trust, Shared Understanding and Locus of Control in Mixed-Initiative Robotic Systems

Manolis Chiou, Faye McCabe, Markella Grigoriou et al.

This paper investigates how trust, shared understanding between a human operator and a robot, and the Locus of Control (LoC) personality trait, evolve and affect Human-Robot Interaction (HRI) in mixed-initiative robotic systems. As such systems become more advanced and able to instigate actions alongside human operators, there is a shift from robots being perceived as a tool to being a team-mate. Hence, the team-oriented human factors investigated in this paper (i.e. trust, shared understanding, and LoC) can play a crucial role in efficient HRI. Here, we present the results from an experiment inspired by a disaster response scenario in which operators remotely controlled a mobile robot in navigation tasks, with either human-initiative or mixed-initiative control, switching dynamically between two different levels of autonomy: teleoperation and autonomous navigation. Evidence suggests that operators trusted and developed an understanding of the robotic systems, especially in mixed-initiative control, where trust and understanding increased over time, as operators became more familiar with the system and more capable of performing the task. Lastly, evidence and insights are presented on how LoC affects HRI.

ROFeb 28, 2021
Simultaneous Tactile Exploration and Grasp Refinement for Unknown Objects

Cristiana de Farias, Naresh Marturi, Rustam Stolkin et al.

This paper addresses the problem of simultaneously exploring an unknown object to model its shape, using tactile sensors on robotic fingers, while also improving finger placement to optimise grasp stability. In many situations, a robot will have only a partial camera view of the near side of an observed object, for which the far side remains occluded. We show how an initial grasp attempt, based on an initial guess of the overall object shape, yields tactile glances of the far side of the object which enable the shape estimate and consequently the successive grasps to be improved. We propose a grasp exploration approach using a probabilistic representation of shape, based on Gaussian Process Implicit Surfaces. This representation enables initial partial vision data to be augmented with additional data from successive tactile glances. This is combined with a probabilistic estimate of grasp quality to refine grasp configurations. When choosing the next set of finger placements, a bi-objective optimisation method is used to mutually maximise grasp quality and improve shape representation during successive grasp attempts. Experimental results show that the proposed approach yields stable grasp configurations more efficiently than a baseline method, while also yielding improved shape estimate of the grasped object.

RONov 10, 2020
VFH+ based shared control for remotely operated mobile robots

Pantelis Pappas, Manolis Chiou, Georgios-Theofanis Epsimos et al.

This paper addresses the problem of safe and efficient navigation in remotely controlled robots operating in hazardous and unstructured environments; or conducting other remote robotic tasks. A shared control method is presented which blends the commands from a VFH+ obstacle avoidance navigation module with the teleoperation commands provided by an operator via a joypad. The presented approach offers several advantages such as flexibility allowing for a straightforward adaptation of the controller's behaviour and easy integration with variable autonomy systems; as well as the ability to cope with dynamic environments. The advantages of the presented controller are demonstrated by an experimental evaluation in a disaster response scenario. More specifically, presented evidence show a clear performance increase in terms of safety and task completion time compared to a pure teleoperation approach, as well as an ability to cope with previously unobserved obstacles.

RONov 12, 2019
Mixed-Initiative variable autonomy for remotely operated mobile robots

Manolis Chiou, Nick Hawes, Rustam Stolkin

This paper presents an Expert-guided Mixed-Initiative Control Switcher (EMICS) for remotely operated mobile robots. The EMICS enables switching between different levels of autonomy during task execution initiated by either the human operator and/or the EMICS. The EMICS is evaluated in two disaster response inspired experiments, one with a simulated robot and test arena, and one with a real robot in a realistic environment. Analyses from the two experiments provide evidence that: a) Human-Initiative (HI) systems outperform systems with single modes of operation, such as pure teleoperation, in navigation tasks; b) in the context of the simulated robot experiment, Mixed-Initiative (MI) systems provide improved performance in navigation tasks, improved operator performance in cognitive demanding secondary tasks, and improved operator workload compared to HI. Results also reinforce previous human-robot interaction evidence regarding the importance of the operator's personality traits and their trust in the autonomous system. Lastly, our experiment on a physical robot provides empirical evidence that identify two major challenges for MI control: a) the design of context-aware MI control systems; and b) the conflict for control between the robot's MI control system and the operator. Insights regarding these challenges are discussed and ways to tackle them are proposed.

RONov 11, 2019
Estimation and Exploitation of Objects' Inertial Parameters in Robotic Grasping and Manipulation: A Survey

Nikos Mavrakis, Rustam Stolkin

Inertial parameters characterise an object's motion under applied forces, and can provide strong priors for planning and control of robotic actions to manipulate the object. However, these parameters are not available a-priori in situations where a robot encounters new objects. In this paper, we describe and categorise the ways that a robot can identify an object's inertial parameters. We also discuss grasping and manipulation methods in which knowledge of inertial parameters is exploited in various ways. We begin with a discussion of literature which investigates how humans estimate the inertial parameters of objects, to provide background and motivation for this area of robotics research. We frame our discussion of the robotics literature in terms of three categories of estimation methods, according to the amount of interaction with the object: purely visual, exploratory, and fixed-object. Each category is analysed and discussed. To demonstrate the usefulness of inertial estimation research, we describe a number of grasping and manipulation applications that make use of the inertial parameters of objects. The aim of the paper is to thoroughly review and categorise existing work in an important, but under-explored, area of robotics research, present its background and applications, and suggest future directions. Note that this paper does not examine methods of identification of the robot's inertial parameters, but rather the identification of inertial parameters of other objects which the robot is tasked with manipulating.

ROJul 18, 2019
Robust and fast generation of top and side grasps for unknown objects

Brice Denoun, Beatriz Leon, Claudio Zito et al.

In this work, we present a geometry-based grasping algorithm that is capable of efficiently generating both top and side grasps for unknown objects, using a single view RGB-D camera, and of selecting the most promising one. We demonstrate the effectiveness of our approach on a picking scenario on a real robot platform. Our approach has shown to be more reliable than another recent geometry-based method considered as baseline [7] in terms of grasp stability, by increasing the successful grasp attempts by a factor of six.

ROJun 27, 2019
Automatic Detection of Myocontrol Failures Based upon Situational Context Information

Karoline Heiwolt, Claudio Zito, Markus Nowak et al.

Myoelectric control systems for assistive devices are still unreliable. The user's input signals can become unstable over time due to e.g. fatigue, electrode displacement, or sweat. Hence, such controllers need to be constantly updated and heavily rely on user feedback. In this paper, we present an automatic failure detection method which learns when plausible predictions become unreliable and model updates are necessary. Our key insight is to enhance the control system with a set of generative models that learn sensible behaviour for a desired task from human demonstration. We illustrate our approach on a grasping scenario in Virtual Reality, in which the user is asked to grasp a bottle on a table. From demonstration our model learns the reach-to-grasp motion from a resting position to two grasps (power grasp and tridigital grasp) and how to predict the most adequate grasp from local context, e.g. tridigital grasp on the bottle cap or around the bottleneck. By measuring the error between new grasp attempts and the model prediction, the system can effectively detect which input commands do not reflect the user's intention. We evaluated our model in two cases: i) with both position and rotation information of the wrist pose, and ii) with only rotational information. Our results show that our approach detects statistically highly significant differences in error distributions with p < 0.001 between successful and failed grasp attempts in both cases.

ROJun 19, 2019
Metrics and Benchmarks for Remote Shared Controllers in Industrial Applications

Claudio Zito, Maxime Adjigble, Brice D. Denoun et al.

Remote manipulation is emerging as one of the key robotics tasks needed in extreme environments. Several researchers have investigated how to add AI components into shared controllers to improve their reliability. Nonetheless, the impact of novel research approaches in real-world applications can have a very slow in-take. We propose a set of benchmarks and metrics to evaluate how the AI components of remote shared control algorithms can improve the effectiveness of such frameworks for real industrial applications. We also present an empirical evaluation of a simple intelligent share controller against a manually operated manipulator in a tele-operated grasping scenario.

ROJun 19, 2019
2D Linear Time-Variant Controller for Human's Intention Detection for Reach-to-Grasp Trajectories in Novel Scenes

Claudio Zito, Tomasz Deregowski, Rustam Stolkin

Designing robotic assistance devices for manipulation tasks is challenging. This work is concerned with improving accuracy and usability of semi-autonomous robots, such as human operated manipulators or exoskeletons. The key insight is to develop a system that takes into account context- and user-awareness to take better decisions in how to assist the user. The context-awareness is implemented by enabling the system to automatically generate a set of candidate grasps and reach-to-grasp trajectories in novel, cluttered scenes. The user-awareness is implemented as a linear time-variant feedback controller to facilitate the motion towards the most promising grasp. Our approach is demonstrated in a simple 2D example in which participants are asked to grasp a specific object in a clutter scene. Our approach also reduce the number of controllable dimensions for the user by providing only control on x- and y-axis, while orientation of the end-effector and the pose of its fingers are inferred by the system. The experimental results show the benefits of our approach in terms of accuracy and execution time with respect to a pure manual control.

ROMay 13, 2019
Let's Push Things Forward: A Survey on Robot Pushing

Jochen Stüber, Claudio Zito, Rustam Stolkin

As robot make their way out of factories into human environments, outer space, and beyond, they require the skill to manipulate their environment in multifarious, unforeseeable circumstances. With this regard, pushing is an essential motion primitive that dramatically extends a robot's manipulation repertoire. In this work, we review the robotic pushing literature. While focusing on work concerned with predicting the motion of pushed objects, we also cover relevant applications of pushing for planning and control. Beginning with analytical approaches, under which we also subsume physics engines, we then proceed to discuss work on learning models from data. In doing so, we dedicate a separate section to deep learning approaches which have seen a recent upsurge in the literature. Concluding remarks and further research perspectives are given at the end of the paper.

ROMar 13, 2019
Hypothesis-based Belief Planning for Dexterous Grasping

Claudio Zito, Valerio Ortenzi, Maxime Adjigble et al.

Belief space planning is a viable alternative to formalise partially observable control problems and, in the recent years, its application to robot manipulation problems has grown. However, this planning approach was tried successfully only on simplified control problems. In this paper, we apply belief space planning to the problem of planning dexterous reach-to-grasp trajectories under object pose uncertainty. In our framework, the robot perceives the object to be grasped on-the-fly as a point cloud and compute a full 6D, non-Gaussian distribution over the object's pose (our belief space). The system has no limitations on the geometry of the object, i.e., non-convex objects can be represented, nor assumes that the point cloud is a complete representation of the object. A plan in the belief space is then created to reach and grasp the object, such that the information value of expected contacts along the trajectory is maximised to compensate for the pose uncertainty. If an unexpected contact occurs when performing the action, such information is used to refine the pose distribution and triggers a re-planning. Experimental results show that our planner (IR3ne) improves grasp reliability and compensates for the pose uncertainty such that it doubles the proportion of grasps that succeed on a first attempt.

CVJul 4, 2018
Sensors, SLAM and Long-term Autonomy: A Review

Mubariz Zaffar, Shoaib Ehsan, Rustam Stolkin et al.

Simultaneous Localization and Mapping, commonly known as SLAM, has been an active research area in the field of Robotics over the past three decades. For solving the SLAM problem, every robot is equipped with either a single sensor or a combination of similar/different sensors. This paper attempts to review, discuss, evaluate and compare these sensors. Keeping an eye on future, this paper also assesses the characteristics of these sensors against factors critical to the long-term autonomy challenge.

ROMar 6, 2018
Learning monocular visual odometry with dense 3D mapping from dense 3D flow

Cheng Zhao, Li Sun, Pulak Purkait et al.

This paper introduces a fully deep learning approach to monocular SLAM, which can perform simultaneous localization using a neural network for learning visual odometry (L-VO) and dense 3D mapping. Dense 2D flow and a depth image are generated from monocular images by sub-networks, which are then used by a 3D flow associated layer in the L-VO network to generate dense 3D flow. Given this 3D flow, the dual-stream L-VO network can then predict the 6DOF relative pose and furthermore reconstruct the vehicle trajectory. In order to learn the correlation between motion directions, the Bivariate Gaussian modelling is employed in the loss function. The L-VO network achieves an overall performance of 2.68% for average translational error and 0.0143 deg/m for average rotational error on the KITTI odometry benchmark. Moreover, the learned depth is fully leveraged to generate a dense 3D map. As a result, an entire visual SLAM system, that is, learning monocular odometry combined with dense 3D mapping, is achieved.

RODec 12, 2017
Grasp that optimises objectives along post-grasp trajectories

Amir M Ghalamzan E, Nikos Mavrakis, Rustam Stolkin

In this article, we study the problem of selecting a grasping pose on the surface of an object to be manipulated by considering three post-grasp objectives. These objectives include (i) kinematic manipulation capability, (ii) torque effort \cite{mavrakis2016analysis} and (iii) impact force in case of a collision during post-grasp manipulative actions. In these works, the main assumption is that a manipulation task, i.e. trajectory of the centre of mass (CoM) of an object is given. In addition, inertial properties of the object to be manipulated is known. For example, a robot needs to pick an object located at point A and place it at point B by moving it along a given path. Therefore, the problem to be solved is to find an initial grasp pose that yields the maximum kinematic manipulation capability, minimum joint effort and effective mass along a given post-grasp trajectories. However, these objectives may conflict in some cases making it impossible to obtain the best values for all of them. We perform a series of experiments to show how different objectives change as the grasping pose on an object alters. The experimental results presented in this paper illustrate that these objectives are conflicting for some desired post-grasp trajectories. This indicates that a detailed multi-objective optimization is needed for properly addressing this problem in a future work.

CVSep 30, 2017
Dense RGB-D semantic mapping with Pixel-Voxel neural network

Cheng Zhao, Li Sun, Pulak Purkait et al.

For intelligent robotics applications, extending 3D mapping to 3D semantic mapping enables robots to, not only localize themselves with respect to the scene's geometrical features but also simultaneously understand the higher level meaning of the scene contexts. Most previous methods focus on geometric 3D reconstruction and scene understanding independently notwithstanding the fact that joint estimation can boost the accuracy of the semantic mapping. In this paper, a dense RGB-D semantic mapping system with a Pixel-Voxel network is proposed, which can perform dense 3D mapping while simultaneously recognizing and semantically labelling each point in the 3D map. The proposed Pixel-Voxel network obtains global context information by using PixelNet to exploit the RGB image and meanwhile, preserves accurate local shape information by using VoxelNet to exploit the corresponding 3D point cloud. Unlike the existing architecture that fuses score maps from different models with equal weights, we proposed a Softmax weighted fusion stack that adaptively learns the varying contributions of PixelNet and VoxelNet, and fuses the score maps of the two models according to their respective confidence levels. The proposed Pixel-Voxel network achieves the state-of-the-art semantic segmentation performance on the SUN RGB-D benchmark dataset. The runtime of the proposed system can be boosted to 11-12Hz, enabling near to real-time performance using an i7 8-cores PC with Titan X GPU.

ROJul 25, 2017
Safe Robotic Grasping: Minimum Impact-Force Grasp Selection

Nikos Mavrakis, Amir M. Ghalamzan E., Rustam Stolkin

This paper addresses the problem of selecting from a choice of possible grasps, so that impact forces will be minimised if a collision occurs while the robot is moving the grasped object along a post-grasp trajectory. Such considerations are important for safety in human-robot interaction, where even a certified "human-safe" (e.g. compliant) arm may become hazardous once it grasps and begins moving an object, which may have significant mass, sharp edges or other dangers. Additionally, minimising collision forces is critical to preserving the longevity of robots which operate in uncertain and hazardous environments, e.g. robots deployed for nuclear decommissioning, where removing a damaged robot from a contaminated zone for repairs may be extremely difficult and costly. Also, unwanted collisions between a robot and critical infrastructure (e.g. pipework) in such high-consequence environments can be disastrous. In this paper, we investigate how the safety of the post-grasp motion can be considered during the pre-grasp approach phase, so that the selected grasp is optimal in terms applying minimum impact forces if a collision occurs during a desired post-grasp manipulation. We build on the methods of augmented robot-object dynamics models and "effective mass" and propose a method for combining these concepts with modern grasp and trajectory planners, to enable the robot to achieve a grasp which maximises the safety of the post-grasp trajectory, by minimising potential collision forces. We demonstrate the effectiveness of our approach through several experiments with both simulated and real robots.

ROJul 25, 2017
Human-in-the-loop optimisation: mixed initiative grasping for optimally facilitating post-grasp manipulative actions

Amir M. Ghalamzan Esfahani, Firas Abi-Farraj, Paolo Robuffo Giordano et al.

This paper addresses the problem of mixed initiative, shared control for master-slave grasping and manipulation. We propose a novel system, in which an autonomous agent assists a human in teleoperating a remote slave arm/gripper, using a haptic master device. Our system is designed to exploit the human operator's expertise in selecting stable grasps (still an open research topic in autonomous robotics). Meanwhile, a-priori knowledge of: i) the slave robot kinematics, and ii) the desired post-grasp manipulative trajectory, are fed to an autonomous agent which transmits force cues to the human, to encourage maximally manipulable grasp pose selections. Specifically, the autonomous agent provides force cues to the human, during the reach-to-grasp phase, which encourage the human to select grasp poses which maximise manipulation capability during the post-grasp object manipulation phase. We introduce a task-relevant velocity manipulability cost function (TOV), which is used to identify the maximum kinematic capability of a manipulator during post-grasp motions, and feed this back as force cues to the human during the pre-grasp phase. We show that grasps which minimise TOV result in significantly reduced control effort of the manipulator, compared to other feasible grasps. We demonstrate the effectiveness of our approach by experiments with both real and simulated robots.

ROJul 22, 2017
Single-Shot Clothing Category Recognition in Free-Configurations with Application to Autonomous Clothes Sorting

Li Sun, Gerardo Aragon-Camarasa, Simon Rogers et al.

This paper proposes a single-shot approach for recognising clothing categories from 2.5D features. We propose two visual features, BSP (B-Spline Patch) and TSD (Topology Spatial Distances) for this task. The local BSP features are encoded by LLC (Locality-constrained Linear Coding) and fused with three different global features. Our visual feature is robust to deformable shapes and our approach is able to recognise the category of unknown clothing in unconstrained and random configurations. We integrated the category recognition pipeline with a stereo vision system, clothing instance detection, and dual-arm manipulators to achieve an autonomous sorting system. To verify the performance of our proposed method, we build a high-resolution RGBD clothing dataset of 50 clothing items of 5 categories sampled in random configurations (a total of 2,100 clothing samples). Experimental results show that our approach is able to reach 83.2\% accuracy while classifying clothing items which were previously unseen during training. This advances beyond the previous state-of-the-art by 36.2\%. Finally, we evaluate the proposed approach in an autonomous robot sorting system, in which the robot recognises a clothing item from an unconstrained pile, grasps it, and sorts it into a box according to its category. Our proposed sorting system achieves reasonable sorting success rates with single-shot perception.

CVMar 19, 2017
Weakly-supervised DCNN for RGB-D Object Recognition in Real-World Applications Which Lack Large-scale Annotated Training Data

Li Sun, Cheng Zhao, Rustam Stolkin

This paper addresses the problem of RGBD object recognition in real-world applications, where large amounts of annotated training data are typically unavailable. To overcome this problem, we propose a novel, weakly-supervised learning architecture (DCNN-GPC) which combines parametric models (a pair of Deep Convolutional Neural Networks (DCNN) for RGB and D modalities) with non-parametric models (Gaussian Process Classification). Our system is initially trained using a small amount of labeled data, and then automatically prop- agates labels to large-scale unlabeled data. We first run 3D- based objectness detection on RGBD videos to acquire many unlabeled object proposals, and then employ DCNN-GPC to label them. As a result, our multi-modal DCNN can be trained end-to-end using only a small amount of human annotation. Finally, our 3D-based objectness detection and multi-modal DCNN are integrated into a real-time detection and recognition pipeline. In our approach, bounding-box annotations are not required and boundary-aware detection is achieved. We also propose a novel way to pretrain a DCNN for the depth modality, by training on virtual depth images projected from CAD models. We pretrain our multi-modal DCNN on public 3D datasets, achieving performance comparable to state-of-the-art methods on Washington RGBS Dataset. We then finetune the network by further training on a small amount of annotated data from our novel dataset of industrial objects (nuclear waste simulants). Our weakly supervised approach has demonstrated to be highly effective in solving a novel RGBD object recognition application which lacks of human annotations.

CVMar 14, 2017
A fully end-to-end deep learning approach for real-time simultaneous 3D reconstruction and material recognition

Cheng Zhao, Li Sun, Rustam Stolkin

This paper addresses the problem of simultaneous 3D reconstruction and material recognition and segmentation. Enabling robots to recognise different materials (concrete, metal etc.) in a scene is important for many tasks, e.g. robotic interventions in nuclear decommissioning. Previous work on 3D semantic reconstruction has predominantly focused on recognition of everyday domestic objects (tables, chairs etc.), whereas previous work on material recognition has largely been confined to single 2D images without any 3D reconstruction. Meanwhile, most 3D semantic reconstruction methods rely on computationally expensive post-processing, using Fully-Connected Conditional Random Fields (CRFs), to achieve consistent segmentations. In contrast, we propose a deep learning method which performs 3D reconstruction while simultaneously recognising different types of materials and labelling them at the pixel level. Unlike previous methods, we propose a fully end-to-end approach, which does not require hand-crafted features or CRF post-processing. Instead, we use only learned features, and the CRF segmentation constraints are incorporated inside the fully end-to-end learned system. We present the results of experiments, in which we trained our system to perform real-time 3D semantic reconstruction for 23 different materials in a real-world application. The run-time performance of the system can be boosted to around 10Hz, using a conventional GPU, which is enough to achieve real-time semantic reconstruction using a 30fps RGB-D camera. To the best of our knowledge, this work is the first real-time end-to-end system for simultaneous 3D reconstruction and material recognition.