Peter Corke

RO
h-index77
38papers
3,165citations
Novelty45%
AI Score39

38 Papers

RONov 5, 2022
Learning Fabric Manipulation in the Real World with Human Videos

Robert Lee, Jad Abou-Chakra, Fangyi Zhang et al.

Fabric manipulation is a long-standing challenge in robotics due to the enormous state space and complex dynamics. Learning approaches stand out as promising for this domain as they allow us to learn behaviours directly from data. Most prior methods however rely heavily on simulation, which is still limited by the large sim-to-real gap of deformable objects or rely on large datasets. A promising alternative is to learn fabric manipulation directly from watching humans perform the task. In this work, we explore how demonstrations for fabric manipulation tasks can be collected directly by humans, providing an extremely natural and fast data collection pipeline. Then, using only a handful of such demonstrations, we show how a pick-and-place policy can be learned and deployed on a real robot, without any robot data collection at all. We demonstrate our approach on a fabric folding task, showing that our policy can reliably reach folded states from crumpled initial configurations. Videos are available at: https://sites.google.com/view/foldingbyhand

CVMar 29, 2023
The Need for Inherently Privacy-Preserving Vision in Trustworthy Autonomous Systems

Adam K. Taras, Niko Suenderhauf, Peter Corke et al.

Vision is a popular and effective sensor for robotics from which we can derive rich information about the environment: the geometry and semantics of the scene, as well as the age, gender, identity, activity and even emotional state of humans within that scene. This raises important questions about the reach, lifespan, and potential misuse of this information. This paper is a call to action to consider privacy in the context of robotic vision. We propose a specific form privacy preservation in which no images are captured or could be reconstructed by an attacker even with full remote access. We present a set of principles by which such systems can be designed, and through a case study in localisation demonstrate in simulation a specific implementation that delivers an important robotic capability in an inherently privacy-preserving manner. This is a first step, and we hope to inspire future works that expand the range of applications open to sighted robotic systems.

ROSep 10, 2021Code
A Holistic Approach to Reactive Mobile Manipulation

Jesse Haviland, Niko Sünderhauf, Peter Corke

We present the design and implementation of a taskable reactive mobile manipulation system. In contrary to related work, we treat the arm and base degrees of freedom as a holistic structure which greatly improves the speed and fluidity of the resulting motion. At the core of this approach is a robust and reactive motion controller which can achieve a desired end-effector pose, while avoiding joint position and velocity limits, and ensuring the mobile manipulator is manoeuvrable throughout the trajectory. This can support sensor-based behaviours such as closed-loop visual grasping. As no planning is involved in our approach, the robot is never stationary thinking about what to do next. We show the versatility of our holistic motion controller by implementing a pick and place system using behaviour trees and demonstrate this task on a 9-degree-of-freedom mobile manipulator. Additionally, we provide an open-source implementation of our motion controller for both non-holonomic and omnidirectional mobile manipulators available at jhavl.github.io/holistic.

ROOct 17, 2020Code
A Systematic Approach to Computing the Manipulator Jacobian and Hessian using the Elementary Transform Sequence

Jesse Haviland, Peter Corke

The elementary transform sequence (ETS) provides a universal method of describing the kinematics of any serial-link manipulator. The ETS notation is intuitive and easy to understand, while avoiding the complexity and limitations of Denvit-Hartenberg frame assignment. In this paper, we describe a systematic method for computing the manipulator Jacobian and Hessian (differential kinematics) using the ETS notation. Differential kinematics have many applications including numerical inverse kinematics, resolved-rate motion control and manipulability motion control. Furthermore, we provide an open-source Python library which implements our algorithm and can be interfaced with any serial-link manipulator (available at github.com/petercorke/robotics-toolbox-python).

ROOct 17, 2020Code
NEO: A Novel Expeditious Optimisation Algorithm for Reactive Motion Control of Manipulators

Jesse Haviland, Peter Corke

We present NEO, a fast and purely reactive motion controller for manipulators which can avoid static and dynamic obstacles while moving to the desired end-effector pose. Additionally, our controller maximises the manipulability of the robot during the trajectory, while avoiding joint position and velocity limits. NEO is wrapped into a strictly convex quadratic programme which, when considering obstacles, joint limits, and manipulability on a 7 degree-of-freedom robot, is generally solved in a few ms. While NEO is not intended to replace state-of-the-art motion planners, our experiments show that it is a viable alternative for scenes with moderate complexity while also being capable of reactive control. For more complex scenes, NEO is better suited as a reactive local controller, in conjunction with a global motion planner. We compare NEO to motion planners on a standard benchmark in simulation and additionally illustrate and verify its operation on a physical robot in a dynamic environment. We provide an open-source library which implements our controller.

ROFeb 27, 2020Code
A Purely-Reactive Manipulability-Maximising Motion Controller

Jesse Haviland, Peter Corke

We present a novel approach to controlling the instantaneous velocity of a robot end-effector that is able to simultaneously maximise manipulability and avoid joint limits. It operates on non-redundant and redundant robots, which is achieved by adding artificial redundancy in the form of controlled path deviation. We formulate the problem as a quadratic programme and provide an open-source Python implementation that provides solutions in just a few milliseconds. It accepts a robot model expressed using URDF or Denavit-Hartenberg parameterisation. We compare our method to previous work in simulation and on a physical robot.

ROJan 31, 2020Code
Robot Navigation in Unseen Spaces using an Abstract Map

Ben Talbot, Feras Dayoub, Peter Corke et al.

Human navigation in built environments depends on symbolic spatial information which has unrealised potential to enhance robot navigation capabilities. Information sources such as labels, signs, maps, planners, spoken directions, and navigational gestures communicate a wealth of spatial information to the navigators of built environments; a wealth of information that robots typically ignore. We present a robot navigation system that uses the same symbolic spatial information employed by humans to purposefully navigate in unseen built environments with a level of performance comparable to humans. The navigation system uses a novel data structure called the abstract map to imagine malleable spatial models for unseen spaces from spatial symbols. Sensorimotor perceptions from a robot are then employed to provide purposeful navigation to symbolic goal locations in the unseen environment. We show how a dynamic system can be used to create malleable spatial models for the abstract map, and provide an open source implementation to encourage future work in the area of symbolic navigation. Symbolic navigation performance of humans and a robot is evaluated in a real-world built environment. The paper concludes with a qualitative analysis of human navigation strategies, providing further insights into how the symbolic navigation capabilities of robots in unseen built environments can be improved in the future.

RODec 16, 2016Code
Mirrored Light Field Video Camera Adapter

Dorian Tsai, Donald G. Dansereau, Steve Martin et al.

This paper proposes the design of a custom mirror-based light field camera adapter that is cheap, simple in construction, and accessible. Mirrors of different shape and orientation reflect the scene into an upwards-facing camera to create an array of virtual cameras with overlapping field of view at specified depths, and deliver video frame rate light fields. We describe the design, construction, decoding and calibration processes of our mirror-based light field camera adapter in preparation for an open-source release to benefit the robotic vision community.

ROJul 26, 2025
A roadmap for AI in robotics

Aude Billard, Alin Albu-Schaeffer, Michael Beetz et al.

AI technologies, including deep learning, large-language models have gone from one breakthrough to the other. As a result, we are witnessing growing excitement in robotics at the prospect of leveraging the potential of AI to tackle some of the outstanding barriers to the full deployment of robots in our daily lives. However, action and sensing in the physical world pose greater and different challenges than analysing data in isolation. As the development and application of AI in robotic products advances, it is important to reflect on which technologies, among the vast array of network architectures and learning models now available in the AI field, are most likely to be successfully applied to robots; how they can be adapted to specific robot designs, tasks, environments; which challenges must be overcome. This article offers an assessment of what AI for robotics has achieved since the 1990s and proposes a short- and medium-term research roadmap listing challenges and promises. These range from keeping up-to-date large datasets, representatives of a diversity of tasks robots may have to perform, and of environments they may encounter, to designing AI algorithms tailored specifically to robotics problems but generic enough to apply to a wide range of applications and transfer easily to a variety of robotic platforms. For robots to collaborate effectively with humans, they must predict human behavior without relying on bias-based profiling. Explainability and transparency in AI-driven robot control are not optional but essential for building trust, preventing misuse, and attributing responsibility in accidents. We close on what we view as the primary long-term challenges, that is, to design robots capable of lifelong learning, while guaranteeing safe deployment and usage, and sustainable computational costs.

ROFeb 25, 2022
Visibility Maximization Controller for Robotic Manipulation

Kerry He, Rhys Newbury, Tin Tran et al.

Occlusions caused by a robot's own body is a common problem for closed-loop control methods employed in eye-to-hand camera setups. We propose an optimization-based reactive controller that minimizes self-occlusions while achieving a desired goal pose. The approach allows coordinated control between the robot's base, arm and head by encoding the line-of-sight visibility to the target as a soft constraint along with other task-related constraints, and solving for feasible joint and base velocities. The generalizability of the approach is demonstrated in simulated and real-world experiments, on robots with fixed or mobile bases, with moving or fixed objects, and multiple objects. The experiments revealed a trade-off between occlusion rates and other task metrics. While a planning-based baseline achieved lower occlusion rates than the proposed controller, it came at the expense of highly inefficient paths and a significant drop in the task success. On the other hand, the proposed controller is shown to improve visibility to the line target object(s) without sacrificing too much from the task success and efficiency. Videos and code can be found at: rhys-newbury.github.io/projects/vmc/.

CVAug 19, 2021
FSNet: A Failure Detection Framework for Semantic Segmentation

Quazi Marufur Rahman, Niko Sünderhauf, Peter Corke et al.

Semantic segmentation is an important task that helps autonomous vehicles understand their surroundings and navigate safely. During deployment, even the most mature segmentation models are vulnerable to various external factors that can degrade the segmentation performance with potentially catastrophic consequences for the vehicle and its surroundings. To address this issue, we propose a failure detection framework to identify pixel-level misclassification. We do so by exploiting internal features of the segmentation model and training it simultaneously with a failure detection network. During deployment, the failure detector can flag areas in the image where the segmentation model have failed to segment correctly. We evaluate the proposed approach against state-of-the-art methods and achieve 12.30%, 9.46%, and 9.65% performance improvement in the AUPR-Error metric for Cityscapes, BDD100K, and Mapillary semantic segmentation datasets.

ROApr 15, 2021
Tabletop Object Rearrangement: Team ACRV's Entry to OCRTOC

Zheyu Zhang, Rhys Newbury, Kerry He et al.

Open Cloud Robot Table Organization Challenge (OCRTOC) is one of the most comprehensive cloud-based robotic manipulation competitions. It focuses on rearranging tabletop objects using vision as its primary sensing modality. In this extended abstract, we present our entry to the OCRTOC2020 and the key challenges the team has experienced.

ROMar 29, 2021
Refractive Light-Field Features for Curved Transparent Objects in Structure from Motion

Dorian Tsai, Peter Corke, Thierry Peynot et al.

Curved refractive objects are common in the human environment, and have a complex visual appearance that can cause robotic vision algorithms to fail. Light-field cameras allow us to address this challenge by capturing the view-dependent appearance of such objects in a single exposure. We propose a novel image feature for light fields that detects and describes the patterns of light refracted through curved transparent objects. We derive characteristic points based on these features allowing them to be used in place of conventional 2D features. Using our features, we demonstrate improved structure-from-motion performance in challenging scenes containing refractive objects, including quantitative evaluations that show improved camera pose estimates and 3D reconstructions. Additionally, our methods converge 15-35% more frequently than the state-of-the-art. Our method is a critical step towards allowing robots to operate around refractive objects, with applications in manufacturing, quality assurance, pick-and-place, and domestic robots working with acrylic, glass and other transparent materials.

ROJan 5, 2021
Run-Time Monitoring of Machine Learning for Robotic Perception: A Survey of Emerging Trends

Quazi Marufur Rahman, Peter Corke, Feras Dayoub

As deep learning continues to dominate all state-of-the-art computer vision tasks, it is increasingly becoming an essential building block for robotic perception. This raises important questions concerning the safety and reliability of learning-based perception systems. There is an established field that studies safety certification and convergence guarantees of complex software systems at design-time. However, the unknown future deployment environments of an autonomous system and the complexity of learning-based perception make the generalization of design-time verification to run-time problematic. In the face of this challenge, more attention is starting to focus on run-time monitoring of performance and reliability of perception systems with several trends emerging in the literature. This paper attempts to identify these trends and summarise the various approaches to the topic.

ROJan 2, 2021
Semantics for Robotic Mapping, Perception and Interaction: A Survey

Sourav Garg, Niko Sünderhauf, Feras Dayoub et al.

For robots to navigate and interact more richly with the world around them, they will likely require a deeper understanding of the world in which they operate. In robotics and related research fields, the study of understanding is often referred to as semantics, which dictates what does the world "mean" to a robot, and is strongly tied to the question of how to represent that meaning. With humans and robots increasingly operating in the same world, the prospects of human-robot interaction also bring semantics and ontology of natural language into the picture. Driven by need, as well as by enablers like increasing availability of training data and computational resources, semantics is a rapidly growing research area in robotics. The field has received significant attention in the research literature to date, but most reviews and surveys have focused on particular aspects of the topic: the technical research issues regarding its use in specific robotic topics like mapping or segmentation, or its relevance to one particular application domain like autonomous driving. A new treatment is therefore required, and is also timely because so much relevant research has occurred since many of the key surveys were published. This survey therefore provides an overarching snapshot of where semantics in robotics stands today. We establish a taxonomy for semantics research in or relevant to robotics, split into four broad categories of activity, in which semantics are extracted, used, or both. Within these broad categories we survey dozens of major topics including fundamentals from the computer vision field and key robotics research areas utilizing semantics, including mapping, navigation and interaction with the world. The survey also covers key practical considerations, including enablers like increased data availability and improved computational hardware, and major application areas where...

ROOct 7, 2020
Learning Arbitrary-Goal Fabric Folding with One Hour of Real Robot Experience

Robert Lee, Daniel Ward, Akansel Cosgun et al.

Manipulating deformable objects, such as fabric, is a long standing problem in robotics, with state estimation and control posing a significant challenge for traditional methods. In this paper, we show that it is possible to learn fabric folding skills in only an hour of self-supervised real robot experience, without human supervision or simulation. Our approach relies on fully convolutional networks and the manipulation of visual inputs to exploit learned features, allowing us to create an expressive goal-conditioned pick and place policy that can be trained efficiently with real world robot data only. Folding skills are learned with only a sparse reward function and thus do not require reward function engineering, merely an image of the goal configuration. We demonstrate our method on a set of towel-folding tasks, and show that our approach is able to discover sequential folding strategies, purely from trial-and-error. We achieve state-of-the-art results without the need for demonstrations or simulation, used in prior approaches. Videos available at: https://sites.google.com/view/learningtofold

ROJun 2, 2020
Object-Independent Human-to-Robot Handovers using Real Time Robotic Vision

Patrick Rosenberger, Akansel Cosgun, Rhys Newbury et al.

We present an approach for safe and object-independent human-to-robot handovers using real time robotic vision and manipulation. We aim for general applicability with a generic object detector, a fast grasp selection algorithm and by using a single gripper-mounted RGB-D camera, hence not relying on external sensors. The robot is controlled via visual servoing towards the object of interest. Putting a high emphasis on safety, we use two perception modules: human body part segmentation and hand/finger segmentation. Pixels that are deemed to belong to the human are filtered out from candidate grasp poses, hence ensuring that the robot safely picks the object without colliding with the human partner. The grasp selection and perception modules run concurrently in real-time, which allows monitoring of the progress. In experiments with 13 objects, the robot was able to successfully take the object from the human in 81.9% of the trials.

ROMar 3, 2020
EGAD! an Evolved Grasping Analysis Dataset for diversity and reproducibility in robotic manipulation

Douglas Morrison, Peter Corke, Jürgen Leitner

We present the Evolved Grasping Analysis Dataset (EGAD), comprising over 2000 generated objects aimed at training and evaluating robotic visual grasp detection algorithms. The objects in EGAD are geometrically diverse, filling a space ranging from simple to complex shapes and from easy to difficult to grasp, compared to other datasets for robotic grasping, which may be limited in size or contain only a small number of object classes. Additionally, we specify a set of 49 diverse 3D-printable evaluation objects to encourage reproducible testing of robotic grasping systems across a range of complexity and difficulty. The dataset, code and videos can be found at https://dougsm.github.io/egad/

ROJan 30, 2020
Model-free vision-based shaping of deformable plastic materials

Andrea Cherubini, Valerio Ortenzi, Akansel Cosgun et al.

We address the problem of shaping deformable plastic materials using non-prehensile actions. Shaping plastic objects is challenging, since they are difficult to model and to track visually. We study this problem, by using kinetic sand, a plastic toy material which mimics the physical properties of wet sand. Inspired by a pilot study where humans shape kinetic sand, we define two types of actions: \textit{pushing} the material from the sides and \textit{tapping} from above. The chosen actions are executed with a robotic arm using image-based visual servoing. From the current and desired view of the material, we define states based on visual features such as the outer contour shape and the pixel luminosity values. These are mapped to actions, which are repeated iteratively to reduce the image error until convergence is reached. For pushing, we propose three methods for mapping the visual state to an action. These include heuristic methods and a neural network, trained from human actions. We show that it is possible to obtain simple shapes with the kinetic sand, without explicitly modeling the material. Our approach is limited in the types of shapes it can achieve. A richer set of action types and multi-step reasoning is needed to achieve more sophisticated shapes.

ROJan 16, 2020
Control of the Final-Phase of Closed-Loop Visual Grasping using Image-Based Visual Servoing

Jesse Haviland, Feras Dayoub, Peter Corke

This paper considers the final approach phase of visual-closed-loop grasping where the RGB-D camera is no longer able to provide valid depth information. Many current robotic grasping controllers are not closed-loop and therefore fail for moving objects. Closed-loop grasp controllers based on RGB-D imagery can track a moving object, but fail when the sensor's minimum object distance is violated just before grasping. To overcome this we propose the use of image-based visual servoing (IBVS) to guide the robot to the object-relative grasp pose using camera RGB information. IBVS robustly moves the camera to a goal pose defined implicitly in terms of an image-plane feature configuration. In this work, the goal image feature coordinates are predicted from RGB-D data to enable RGB-only tracking once depth data becomes unavailable -- this enables more reliable grasping of previously unseen moving objects. Experimental results are provided.

ROJan 8, 2020
What can robotics research learn from computer vision research?

Peter Corke, Feras Dayoub, David Hall et al.

The computer vision and robotics research communities are each strong. However progress in computer vision has become turbo-charged in recent years due to big data, GPU computing, novel learning algorithms and a very effective research methodology. By comparison, progress in robotics seems slower. It is true that robotics came later to exploring the potential of learning -- the advantages over the well-established body of knowledge in dynamics, kinematics, planning and control is still being debated, although reinforcement learning seems to offer real potential. However, the rapid development of computer vision compared to robotics cannot be only attributed to the former's adoption of deep learning. In this paper, we argue that the gains in computer vision are due to research methodology -- evaluation under strict constraints versus experiments; bold numbers versus videos.

CVNov 27, 2018
Probabilistic Object Detection: Definition and Evaluation

David Hall, Feras Dayoub, John Skinner et al.

We introduce Probabilistic Object Detection, the task of detecting objects in images and accurately quantifying the spatial and semantic uncertainties of the detections. Given the lack of methods capable of assessing such probabilistic object detections, we present the new Probability-based Detection Quality measure (PDQ).Unlike AP-based measures, PDQ has no arbitrary thresholds and rewards spatial and label quality, and foreground/background separation quality while explicitly penalising false positive and false negative detections. We contrast PDQ with existing mAP and moLRP measures by evaluating state-of-the-art detectors and a Bayesian object detector based on Monte Carlo Dropout. Our experiments indicate that conventional object detectors tend to be spatially overconfident and thus perform poorly on the task of probabilistic object detection. Our paper aims to encourage the development of new object detection approaches that provide detections with accurately estimated spatial and label uncertainties and are of critical importance for deployment on robots and embodied AI systems in the real world.

ROSep 23, 2018
Multi-View Picking: Next-best-view Reaching for Improved Grasping in Clutter

Douglas Morrison, Peter Corke, Jürgen Leitner

Camera viewpoint selection is an important aspect of visual grasp detection, especially in clutter where many occlusions are present. Where other approaches use a static camera position or fixed data collection routines, our Multi-View Picking (MVP) controller uses an active perception approach to choose informative viewpoints based directly on a distribution of grasp pose estimates in real time, reducing uncertainty in the grasp poses caused by clutter and occlusions. In trials of grasping 20 objects from clutter, our MVP controller achieves 80% grasp success, outperforming a single-viewpoint grasp detector by 12%. We also show that our approach is both more accurate and more efficient than approaches which consider multiple fixed viewpoints.

CVMay 31, 2018
Distinguishing Refracted Features using Light Field Cameras with Application to Structure from Motion

Dorian Tsai, Donald G Dansereau, Thierry Peynot et al.

Robots must reliably interact with refractive objects in many applications; however, refractive objects can cause many robotic vision algorithms to become unreliable or even fail, particularly feature-based matching applications, such as structure-from-motion. We propose a method to distinguish between refracted and Lambertian image features using a light field camera. Specifically, we propose to use textural cross-correlation to characterise apparent feature motion in a single light field, and compare this motion to its Lambertian equivalent based on 4D light field geometry. Our refracted feature distinguisher has a 34.3% higher rate of detection compared to state-of-the-art for light fields captured with large baselines relative to the refractive object. Our method also applies to light field cameras with much smaller baselines than previously considered, yielding up to 2 times better detection for 2D-refractive objects, such as a sphere, and up to 8 times better for 1D-refractive objects, such as a cylinder. For structure from motion, we demonstrate that rejecting refracted features using our distinguisher yields up to 42.4% lower reprojection error, and lower failure rate when the robot is approaching refractive objects. Our method lead to more robust robot vision in the presence of refractive objects.

ROApr 18, 2018
The Limits and Potentials of Deep Learning for Robotics

Niko Sünderhauf, Oliver Brock, Walter Scheirer et al.

The application of deep learning in robotics leads to very specific problems and research questions that are typically not addressed by the computer vision and machine learning communities. In this paper we discuss a number of robotics-specific learning, reasoning, and embodiment challenges for deep learning. We explain the need for better evaluation metrics, highlight the importance and unique challenges for deep robotic learning in simulation, and explore the spectrum between purely data-driven and model-driven approaches. We hope this paper provides a motivating overview of important research directions to overcome the current limitations, and help fulfill the promising potentials of deep learning in robotics.

ROApr 14, 2018
Closing the Loop for Robotic Grasping: A Real-time, Generative Grasp Synthesis Approach

Douglas Morrison, Peter Corke, Jürgen Leitner

This paper presents a real-time, object-independent grasp synthesis method which can be used for closed-loop grasping. Our proposed Generative Grasping Convolutional Neural Network (GG-CNN) predicts the quality and pose of grasps at every pixel. This one-to-one mapping from a depth image overcomes limitations of current deep-learning grasping techniques by avoiding discrete sampling of grasp candidates and long computation times. Additionally, our GG-CNN is orders of magnitude smaller while detecting stable grasps with equivalent performance to current state-of-the-art techniques. The light-weight and single-pass generative nature of our GG-CNN allows for closed-loop control at up to 50Hz, enabling accurate grasping in non-static environments where objects move and in the presence of robot control inaccuracies. In our real-world tests, we achieve an 83% grasp success rate on a set of previously unseen objects with adversarial geometry and 88% on a set of household objects that are moved during the grasp attempt. We also achieve 81% accuracy when grasping in dynamic clutter.

ROApr 6, 2018
Assisted Control for Semi-Autonomous Power Infrastructure Inspection using Aerial Vehicles

Aaron McFadyen, Feras Dayoub, Steve Martin et al.

This paper presents the design and implementation of an assisted control technology for a small multirotor platform for aerial inspection of fixed energy infrastructure. Sensor placement is supported by a theoretical analysis of expected sensor performance and constrained platform behaviour to speed up implementation. The optical sensors provide relative position information between the platform and the asset, which enables human operator inputs to be autonomously adjusted to ensure safe separation. The assisted control approach is designed to reduced operator workload during close proximity inspection tasks, with collision avoidance and safe separation managed autonomously. The energy infrastructure includes single vertical wooden poles and crossarm with attached overhead wires. Simulated and real experimental results are provided.

ROSep 18, 2017
Adversarial Discriminative Sim-to-real Transfer of Visuo-motor Policies

Fangyi Zhang, Jürgen Leitner, Zongyuan Ge et al.

Various approaches have been proposed to learn visuo-motor policies for real-world robotic applications. One solution is first learning in simulation then transferring to the real world. In the transfer, most existing approaches need real-world images with labels. However, the labelling process is often expensive or even impractical in many robotic applications. In this paper, we propose an adversarial discriminative sim-to-real transfer approach to reduce the cost of labelling real data. The effectiveness of the approach is demonstrated with modular networks in a table-top object reaching task where a 7 DoF arm is controlled in velocity mode to reach a blue cuboid in clutter through visual observations. The adversarial transfer approach reduced the labelled real data requirement by 50%. Policies can be transferred to real environments with only 93 labelled and 186 unlabelled real images. The transferred visuo-motor policies are robust to novel (not seen in training) objects in clutter and even a moving target, achieving a 97.8% success rate and 1.8 cm control accuracy.

ROMay 24, 2017
Visual Servoing from Deep Neural Networks

Quentin Bateux, Eric Marchand, Jürgen Leitner et al.

We present a deep neural network-based method to perform high-precision, robust and real-time 6 DOF visual servoing. The paper describes how to create a dataset simulating various perturbations (occlusions and lighting conditions) from a single real-world image of the scene. A convolutional neural network is fine-tuned using this dataset to estimate the relative pose between two images of the same scene. The output of the network is then employed in a visual servoing control scheme. The method converges robustly even in difficult real-world settings with strong lighting variations and occlusions.A positioning error of less than one millimeter is obtained in experiments with a 6 DOF robot.

CVMar 21, 2017
Episode-Based Active Learning with Bayesian Neural Networks

Feras Dayoub, Niko Sünderhauf, Peter Corke

We investigate different strategies for active learning with Bayesian deep neural networks. We focus our analysis on scenarios where new, unlabeled data is obtained episodically, such as commonly encountered in mobile robotics applications. An evaluation of different strategies for acquisition, updating, and final training on the CIFAR-10 dataset shows that incremental network updates with final training on the accumulated acquisition set are essential for best performance, while limiting the amount of required human labeling labor.

ROOct 21, 2016
Modular Deep Q Networks for Sim-to-real Transfer of Visuo-motor Policies

Fangyi Zhang, Jürgen Leitner, Michael Milford et al.

While deep learning has had significant successes in computer vision thanks to the abundance of visual data, collecting sufficiently large real-world datasets for robot learning can be costly. To increase the practicality of these techniques on real robots, we propose a modular deep reinforcement learning method capable of transferring models trained in simulation to a real-world robotic task. We introduce a bottleneck between perception and control, enabling the networks to be trained independently, but then merged and fine-tuned in an end-to-end manner to further improve hand-eye coordination. On a canonical, planar visually-guided robot reaching task a fine-tuned accuracy of 1.6 pixels is achieved, a significant improvement over naive transfer (17.5 pixels), showing the potential for more complicated and broader applications. Our method provides a technique for more efficient learning and transfer of visuo-motor policies for real robotic systems without relying entirely on large real-world robot datasets.

ROSep 17, 2016
The ACRV Picking Benchmark (APB): A Robotic Shelf Picking Benchmark to Foster Reproducible Research

Jürgen Leitner, Adam W. Tow, Jake E. Dean et al.

Robotic challenges like the Amazon Picking Challenge (APC) or the DARPA Challenges are an established and important way to drive scientific progress. They make research comparable on a well-defined benchmark with equal test conditions for all participants. However, such challenge events occur only occasionally, are limited to a small number of contestants, and the test conditions are very difficult to replicate after the main event. We present a new physical benchmark challenge for robotic picking: the ACRV Picking Benchmark (APB). Designed to be reproducible, it consists of a set of 42 common objects, a widely available shelf, and exact guidelines for object arrangement using stencils. A well-defined evaluation protocol enables the comparison of \emph{complete} robotic systems -- including perception and manipulation -- instead of sub-systems only. Our paper also describes and reports results achieved by an open baseline system based on a Baxter robot.

CVAug 1, 2016
Exploiting Temporal Information for DCNN-based Fine-Grained Object Classification

ZongYuan Ge, Chris McCool, Conrad Sanderson et al.

Fine-grained classification is a relatively new field that has concentrated on using information from a single image, while ignoring the enormous potential of using video data to improve classification. In this work we present the novel task of video-based fine-grained object classification, propose a corresponding new video dataset, and perform a systematic study of several recent deep convolutional neural network (DCNN) based approaches, which we specifically adapt to the task. We evaluate three-dimensional DCNNs, two-stream DCNNs, and bilinear DCNNs. Two forms of the two-stream approach are used, where spatial and temporal data from two independent DCNNs are fused either via early fusion (combination of the fully-connected layers) and late fusion (concatenation of the softmax outputs of the DCNNs). For bilinear DCNNs, information from the convolutional layers of the spatial and temporal DCNNs is combined via local co-occurrences. We then fuse the bilinear DCNN and early fusion of the two-stream approach to combine the spatial and temporal information at the local and global level (Spatio-Temporal Co-occurrence). Using the new and challenging video dataset of birds, classification performance is improved from 23.1% (using single images) to 41.1% when using the Spatio-Temporal Co-occurrence system. Incorporating automatically detected bounding box location further improves the classification accuracy to 53.6%.

CVNov 30, 2015
Fine-Grained Classification via Mixture of Deep Convolutional Neural Networks

ZongYuan Ge, Alex Bewley, Christopher McCool et al.

We present a novel deep convolutional neural network (DCNN) system for fine-grained image classification, called a mixture of DCNNs (MixDCNN). The fine-grained image classification problem is characterised by large intra-class variations and small inter-class variations. To overcome these problems our proposed MixDCNN system partitions images into K subsets of similar images and learns an expert DCNN for each subset. The output from each of the K DCNNs is combined to form a single classification decision. In contrast to previous techniques, we provide a formulation to perform joint end-to-end training of the K DCNNs simultaneously. Extensive experiments, on three datasets using two network structures (AlexNet and GoogLeNet), show that the proposed MixDCNN system consistently outperforms other methods. It provides a relative improvement of 12.7% and achieves state-of-the-art results on two datasets.

LGNov 12, 2015
Towards Vision-Based Deep Reinforcement Learning for Robotic Motion Control

Fangyi Zhang, Jürgen Leitner, Michael Milford et al.

This paper introduces a machine learning based system for controlling a robotic manipulator with visual perception only. The capability to autonomously learn robot controllers solely from raw-pixel images and without any prior knowledge of configuration is shown for the first time. We build upon the success of recent deep reinforcement learning and develop a system for learning target reaching with a three-joint robot manipulator using external visual observation. A Deep Q Network (DQN) was demonstrated to perform target reaching after training in simulation. Transferring the network to real hardware and real observation in a naive approach failed, but experiments show that the network works when replacing camera images with synthetic images.

ROJul 9, 2015
Place Categorization and Semantic Mapping on a Mobile Robot

Niko Sünderhauf, Feras Dayoub, Sean McMahon et al.

In this paper we focus on the challenging problem of place categorization and semantic mapping on a robot without environment-specific training. Motivated by their ongoing success in various visual recognition tasks, we build our system upon a state-of-the-art convolutional network. We overcome its closed-set limitations by complementing the network with a series of one-vs-all classifiers that can learn to recognize new semantic classes online. Prior domain knowledge is incorporated by embedding the classification system into a Bayesian filter framework that also ensures temporal coherence. We evaluate the classification accuracy of the system on a robot that maps a variety of places on our campus in real-time. We show how semantic information can boost robotic object detection performance and how the semantic map can be used to modulate the robot's behaviour during navigation tasks. The system is made available to the community as a ROS module.

CVMay 9, 2015
Subset Feature Learning for Fine-Grained Category Classification

Zongyuan Ge, Christopher Mccool, Conrad Sanderson et al.

Fine-grained categorisation has been a challenging problem due to small inter-class variation, large intra-class variation and low number of training images. We propose a learning system which first clusters visually similar classes and then learns deep convolutional neural network features specific to each subset. Experiments on the popular fine-grained Caltech-UCSD bird dataset show that the proposed method outperforms recent fine-grained categorisation methods under the most difficult setting: no bounding boxes are presented at test time. It achieves a mean accuracy of 77.5%, compared to the previous best performance of 73.2%. We also show that progressive transfer learning allows us to first learn domain-generic features (for bird classification) which can then be adapted to specific set of bird classes, yielding improvements in accuracy.

CVFeb 27, 2015
Modelling Local Deep Convolutional Neural Network Features to Improve Fine-Grained Image Classification

ZongYuan Ge, Chris McCool, Conrad Sanderson et al.

We propose a local modelling approach using deep convolutional neural networks (CNNs) for fine-grained image classification. Recently, deep CNNs trained from large datasets have considerably improved the performance of object recognition. However, to date there has been limited work using these deep CNNs as local feature extractors. This partly stems from CNNs having internal representations which are high dimensional, thereby making such representations difficult to model using stochastic models. To overcome this issue, we propose to reduce the dimensionality of one of the internal fully connected layers, in conjunction with layer-restricted retraining to avoid retraining the entire network. The distribution of low-dimensional features obtained from the modified layer is then modelled using a Gaussian mixture model. Comparative experiments show that considerable performance improvements can be achieved on the challenging Fish and UEC FOOD-100 datasets.