ROMay 19, 2022
Dexterous Robotic Manipulation using Deep Reinforcement Learning and Knowledge Transfer for Complex Sparse Reward-based TasksQiang Wang, Francisco Roldan Sanchez, Robert McCarthy et al.
This paper describes a deep reinforcement learning (DRL) approach that won Phase 1 of the Real Robot Challenge (RRC) 2021, and then extends this method to a more difficult manipulation task. The RRC consisted of using a TriFinger robot to manipulate a cube along a specified positional trajectory, but with no requirement for the cube to have any specific orientation. We used a relatively simple reward function, a combination of goal-based sparse reward and distance reward, in conjunction with Hindsight Experience Replay (HER) to guide the learning of the DRL agent (Deep Deterministic Policy Gradient (DDPG)). Our approach allowed our agents to acquire dexterous robotic manipulation strategies in simulation. These strategies were then applied to the real robot and outperformed all other competition submissions, including those using more traditional robotic control techniques, in the final evaluation stage of the RRC. Here we extend this method, by modifying the task of Phase 1 of the RRC to require the robot to maintain the cube in a particular orientation, while the cube is moved along the required positional trajectory. The requirement to also orient the cube makes the agent unable to learn the task through blind exploration due to increased problem complexity. To circumvent this issue, we make novel use of a Knowledge Transfer (KT) technique that allows the strategies learned by the agent in the original task (which was agnostic to cube orientation) to be transferred to this task (where orientation matters). KT allowed the agent to learn and perform the extended task in the simulator, which improved the average positional deviation from 0.134 m to 0.02 m, and average orientation deviation from 142° to 76° during evaluation. This KT concept shows good generalisation properties and could be applied to any actor-critic learning algorithm.
SPAug 9, 2022
Adaptive Target-Condition Neural Network: DNN-Aided Load Balancing for Hybrid LiFi and WiFi NetworksHan Ji, Qiang Wang, Stephen J. Redmond et al.
Load balancing (LB) is a challenging issue in the hybrid light fidelity (LiFi) and wireless fidelity (WiFi) networks (HLWNets), due to the nature of heterogeneous access points (APs). Machine learning has the potential to provide a complexity-friendly LB solution with near-optimal network performance, at the cost of a training process. The state-of-the-art (SOTA) learning-aided LB methods, however, need retraining when the network environment (especially the number of users) changes, significantly limiting its practicability. In this paper, a novel deep neural network (DNN) structure named adaptive target-condition neural network (A-TCNN) is proposed, which conducts AP selection for one target user upon the condition of other users. Also, an adaptive mechanism is developed to map a smaller number of users to a larger number through splitting their data rate requirements, without affecting the AP selection result for the target user. This enables the proposed method to handle different numbers of users without the need for retraining. Results show that A-TCNN achieves a network throughput very close to that of the testing dataset, with a gap less than 3%. It is also proven that A-TCNN can obtain a network throughput comparable to two SOTA benchmarks, while reducing the runtime by up to three orders of magnitude.
LGJan 27, 2023
Improving Behavioural Cloning with Positive Unlabeled LearningQiang Wang, Robert McCarthy, David Cordova Bulens et al.
Learning control policies offline from pre-recorded datasets is a promising avenue for solving challenging real-world problems. However, available datasets are typically of mixed quality, with a limited number of the trajectories that we would consider as positive examples; i.e., high-quality demonstrations. Therefore, we propose a novel iterative learning algorithm for identifying expert trajectories in unlabeled mixed-quality robotics datasets given a minimal set of positive examples, surpassing existing algorithms in terms of accuracy. We show that applying behavioral cloning to the resulting filtered dataset outperforms several competitive offline reinforcement learning and imitation learning baselines. We perform experiments on a range of simulated locomotion tasks and on two challenging manipulation tasks on a real robotic system; in these experiments, our method showcases state-of-the-art performance. Our website: \url{https://sites.google.com/view/offline-policy-learning-pubc}.
ROJan 30, 2023
Identifying Expert Behavior in Offline Training Datasets Improves Behavioral Cloning of Robotic Manipulation PoliciesQiang Wang, Robert McCarthy, David Cordova Bulens et al.
This paper presents our solution for the Real Robot Challenge (RRC) III, a competition featured in the NeurIPS 2022 Competition Track, aimed at addressing dexterous robotic manipulation tasks through learning from pre-collected offline data. Participants were provided with two types of datasets for each task: expert and mixed datasets with varying skill levels. While the simplest offline policy learning algorithm, Behavioral Cloning (BC), performed remarkably well when trained on expert datasets, it outperformed even the most advanced offline reinforcement learning (RL) algorithms. However, BC's performance deteriorated when applied to mixed datasets, and the performance of offline RL algorithms was also unsatisfactory. Upon examining the mixed datasets, we observed that they contained a significant amount of expert data, although this data was unlabeled. To address this issue, we proposed a semi-supervised learning-based classifier to identify the underlying expert behavior within mixed datasets, effectively isolating the expert data. To further enhance BC's performance, we leveraged the geometric symmetry of the RRC arena to augment the training dataset through mathematical transformations. In the end, our submission surpassed that of all other participants, even those who employed complex offline RL algorithms and intricate data processing and feature engineering techniques.
ROJul 8, 2023
Robust Learning-Based Incipient Slip Detection using the PapillArray Optical Tactile Sensor for Improved Robotic GrippingQiang Wang, Pablo Martinez Ulloa, Robert Burke et al.
The ability to detect slip, particularly incipient slip, enables robotic systems to take corrective measures to prevent a grasped object from being dropped. Therefore, slip detection can enhance the overall security of robotic gripping. However, accurately detecting incipient slip remains a significant challenge. In this paper, we propose a novel learning-based approach to detect incipient slip using the PapillArray (Contactile, Australia) tactile sensor. The resulting model is highly effective in identifying patterns associated with incipient slip, achieving a detection success rate of 95.6% when tested with an offline dataset. Furthermore, we introduce several data augmentation methods to enhance the robustness of our model. When transferring the trained model to a robotic gripping environment distinct from where the training data was collected, our model maintained robust performance, with a success rate of 96.8%, providing timely feedback for stabilizing several practical gripping tasks. Our project website: https://sites.google.com/view/incipient-slip-detection.
LGFeb 14, 2024
Dataset Clustering for Improved Offline Policy LearningQiang Wang, Yixin Deng, Francisco Roldan Sanchez et al.
Offline policy learning aims to discover decision-making policies from previously-collected datasets without additional online interactions with the environment. As the training dataset is fixed, its quality becomes a crucial determining factor in the performance of the learned policy. This paper studies a dataset characteristic that we refer to as multi-behavior, indicating that the dataset is collected using multiple policies that exhibit distinct behaviors. In contrast, a uni-behavior dataset would be collected solely using one policy. We observed that policies learned from a uni-behavior dataset typically outperform those learned from multi-behavior datasets, despite the uni-behavior dataset having fewer examples and less diversity. Therefore, we propose a behavior-aware deep clustering approach that partitions multi-behavior datasets into several uni-behavior subsets, thereby benefiting downstream policy learning. Our approach is flexible and effective; it can adaptively estimate the number of clusters while demonstrating high clustering accuracy, achieving an average Adjusted Rand Index of 0.987 across various continuous control task datasets. Finally, we present improved policy learning examples using dataset clustering and discuss several potential scenarios where our approach might benefit the offline policy learning community.
LGOct 5, 2021
Imaginary Hindsight Experience Replay: Curious Model-based Learning for Sparse Reward TasksRobert McCarthy, Qiang Wang, Stephen J. Redmond
Model-based reinforcement learning is a promising learning strategy for practical robotic applications due to its improved data-efficiency versus model-free counterparts. However, current state-of-the-art model-based methods rely on shaped reward signals, which can be difficult to design and implement. To remedy this, we propose a simple model-based method tailored for sparse-reward multi-goal tasks that foregoes the need for complicated reward engineering. This approach, termed Imaginary Hindsight Experience Replay, minimises real-world interactions by incorporating imaginary data into policy updates. To improve exploration in the sparse-reward setting, the policy is trained with standard Hindsight Experience Replay and endowed with curiosity-based intrinsic rewards. Upon evaluation, this approach provides an order of magnitude increase in data-efficiency on average versus the state-of-the-art model-free method in the benchmark OpenAI Gym Fetch Robotics tasks.
ROSep 30, 2021
Solving the Real Robot Challenge using Deep Reinforcement LearningRobert McCarthy, Francisco Roldan Sanchez, Qiang Wang et al.
This paper details our winning submission to Phase 1 of the 2021 Real Robot Challenge; a challenge in which a three-fingered robot must carry a cube along specified goal trajectories. To solve Phase 1, we use a pure reinforcement learning approach which requires minimal expert knowledge of the robotic system, or of robotic grasping in general. A sparse, goal-based reward is employed in conjunction with Hindsight Experience Replay to teach the control policy to move the cube to the desired x and y coordinates of the goal. Simultaneously, a dense distance-based reward is employed to teach the policy to lift the cube to the z coordinate (the height component) of the goal. The policy is trained in simulation with domain randomisation before being transferred to the real robot for evaluation. Although performance tends to worsen after this transfer, our best policy can successfully lift the real cube along goal trajectories via an effective pinching grasp. Our approach outperforms all other submissions, including those leveraging more traditional robotic control techniques, and is the first pure learning-based method to solve this challenge.
ROSep 22, 2021
Real Robot Challenge: A Robotics Competition in the CloudStefan Bauer, Felix Widmaier, Manuel Wüthrich et al.
Dexterous manipulation remains an open problem in robotics. To coordinate efforts of the research community towards tackling this problem, we propose a shared benchmark. We designed and built robotic platforms that are hosted at MPI for Intelligent Systems and can be accessed remotely. Each platform consists of three robotic fingers that are capable of dexterous object manipulation. Users are able to control the platforms remotely by submitting code that is executed automatically, akin to a computational cluster. Using this setup, i) we host robotics competitions, where teams from anywhere in the world access our platforms to tackle challenging tasks ii) we publish the datasets collected during these competitions (consisting of hundreds of robot hours), and iii) we give researchers access to these platforms for their own projects.
ROMar 21, 2021
Estimating Lower Body Kinematics using a Lie Group Constrained Extended Kalman Filter and Reduced IMU CountLuke Wicent Sy, Nigel H. Lovell, Stephen J. Redmond
Goal: This paper presents an algorithm for estimating pelvis, thigh, shank, and foot kinematics during walking using only two or three wearable inertial sensors. Methods: The algorithm makes novel use of a Lie-group-based extended Kalman filter. The algorithm iterates through the prediction (kinematic equation), measurement (pelvis position pseudo-measurements, zero-velocity update, and flat-floor assumption), and constraint update (hinged knee and ankle joints, constant leg lengths). Results: The inertial motion capture algorithm was extensively evaluated on two datasets showing its performance against two standard benchmark approaches in optical motion capture (i.e., plug-in gait (commonly used in gait analysis) and a kinematic fit (commonly used in animation, robotics, and musculoskeleton simulation)), giving insight into the similarity and differences between the said approaches used in different application areas. The overall mean body segment position (relative to mid-pelvis origin) and orientation error magnitude of our algorithm ($n=14$ participants) for free walking was $5.93 \pm 1.33$ cm and $13.43 \pm 1.89^\circ$ when using three IMUs placed on the feet and pelvis, and $6.35 \pm 1.20$ cm and $12.71 \pm 1.60^\circ$ when using only two IMUs placed on the feet. Conclusion: The algorithm was able to track the joint angles in the sagittal plane for straight walking well, but requires improvement for unscripted movements (e.g., turning around, side steps), especially for dynamic movements or when considering clinical applications. Significance: This work has brought us closer to comprehensive remote gait monitoring using IMUs on the shoes. The low computational cost also suggests that it can be used in real-time with gait assistive devices.
ROAug 16, 2020
A Biomimetic Tactile Fingerprint Induces Incipient SlipJasper W. James, Stephen J. Redmond, Nathan F. Lepora
We present a modified TacTip biomimetic optical tactile sensor design which demonstrates the ability to induce and detect incipient slip, as confirmed by recording the movement of markers on the sensor's external surface. Incipient slip is defined as slippage of part, but not all, of the contact surface between the sensor and object. The addition of ridges - which mimic the friction ridges in the human fingertip - in a concentric ring pattern allowed for localised shear deformation to occur on the sensor surface for a significant duration prior to the onset of gross slip. By detecting incipient slip we were able to predict when several differently shaped objects were at risk of falling and prevent them from doing so. Detecting incipient slip is useful because a corrective action can be taken before slippage occurs across the entire contact area thus minimising the risk of objects been dropped.
ROOct 4, 2019
Estimating Lower Limb Kinematics using a Lie Group Constrained EKF and a Reduced Wearable IMU CountLuke Sy, Nigel H. Lovell, Stephen J. Redmond
This paper presents an algorithm that makes novel use of a Lie group representation of position and orientation alongside a constrained extended Kalman filter (CEKF) to accurately estimate pelvis, thigh, and shank kinematics during walking using only three wearable inertial sensors. The algorithm iterates through the prediction update (kinematic equation), measurement update (pelvis height, zero velocity update, flat-floor assumption, and covariance limiter), and constraint update (formulation of hinged knee joints and ball-and-socket hip joints). The paper also describes a novel Lie group formulation of the assumptions implemented in the said measurement and constraint updates. Evaluation of the algorithm on nine healthy subjects who walked freely within a $4 \times 4$ m$^2$ room shows that the knee and hip joint angle root-mean-square errors (RMSEs) in the sagittal plane for free walking were $10.5 \pm 2.8^\circ$ and $9.7 \pm 3.3^\circ$, respectively, while the correlation coefficients (CCs) were $0.89 \pm 0.06$ and $0.78 \pm 0.09$, respectively. The evaluation demonstrates a promising application of Lie group representation to inertial motion capture under reduced-sensor-count configuration, improving the estimates (i.e., joint angle RMSEs and CCs) for dynamic motion, and enabling better convergence for our non-linear biomechanical constraints. To further improve performance, additional information relating the pelvis and ankle kinematics is needed.
ROOct 2, 2019
Estimating Lower Limb Kinematics using a Reduced Wearable Sensor CountLuke Sy, Michael Raitor, Michael Del Rosario et al.
Goal: This paper presents an algorithm for accurately estimating pelvis, thigh, and shank kinematics during walking using only three wearable inertial sensors. Methods: The algorithm makes novel use of a constrained Kalman filter (CKF). The algorithm iterates through the prediction (kinematic equation), measurement (pelvis position pseudo-measurements, zero velocity update, flat-floor assumption, and covariance limiter), and constraint update (formulation of hinged knee joints and ball-and-socket hip joints). Results: Evaluation of the algorithm using an optical motion capture-based sensor-to-segment calibration on nine participants ($7$ men and $2$ women, weight $63.0 \pm 6.8$ kg, height $1.70 \pm 0.06$ m, age $24.6 \pm 3.9$ years old), with no known gait or lower body biomechanical abnormalities, who walked within a $4 \times 4$ m$^2$ capture area shows that it can track motion relative to the mid-pelvis origin with mean position and orientation (no bias) root-mean-square error (RMSE) of $5.21 \pm 1.3$ cm and $16.1 \pm 3.2^\circ$, respectively. The sagittal knee and hip joint angle RMSEs (no bias) were $10.0 \pm 2.9^\circ$ and $9.9 \pm 3.2^\circ$, respectively, while the corresponding correlation coefficient (CC) values were $0.87 \pm 0.08$ and $0.74 \pm 0.12$. Conclusion: The CKF-based algorithm was able to track the 3D pose of the pelvis, thigh, and shanks using only three inertial sensors worn on the pelvis and shanks. Significance: Due to the Kalman-filter-based algorithm's low computation cost and the relative convenience of using only three wearable sensors, gait parameters can be computed in real-time and remotely for long-term gait monitoring. Furthermore, the system can be used to inform real-time gait assistive devices.