CVAug 5, 2020Code
Pose-based Modular Network for Human-Object Interaction DetectionZhijun Liang, Junfa Liu, Yisheng Guan et al.
Human-object interaction(HOI) detection is a critical task in scene understanding. The goal is to infer the triplet <subject, predicate, object> in a scene. In this work, we note that the human pose itself as well as the relative spatial information of the human pose with respect to the target object can provide informative cues for HOI detection. We contribute a Pose-based Modular Network (PMN) which explores the absolute pose features and relative spatial pose features to improve HOI detection and is fully compatible with existing networks. Our module consists of a branch that first processes the relative spatial pose features of each joint independently. Another branch updates the absolute pose features via fully connected graph structures. The processed pose features are then fed into an action classifier. To evaluate our proposed method, we combine the module with the state-of-the-art model named VS-GATs and obtain significant improvement on two public benchmarks: V-COCO and HICO-DET, which shows its efficacy and flexibility. Code is available at \url{https://github.com/birlrobotics/PMN}.
LGJan 7, 2020Code
Visual-Semantic Graph Attention Networks for Human-Object Interaction DetectionZhijun Liang, Juan Rojas, Junfa Liu et al.
In scene understanding, robotics benefit from not only detecting individual scene instances but also from learning their possible interactions. Human-Object Interaction (HOI) Detection infers the action predicate on a <human, predicate, object> triplet. Contextual information has been found critical in inferring interactions. However, most works only use local features from single human-object pair for inference. Few works have studied the disambiguating contribution of subsidiary relations made available via graph networks. Similarly, few have learned to effectively leverage visual cues along with the intrinsic semantic regularities contained in HOIs. We contribute a dual-graph attention network that effectively aggregates contextual visual, spatial, and semantic information dynamically from primary human-object relations as well as subsidiary relations through attention mechanisms for strong disambiguating power. We achieve comparable results on two benchmarks: V-COCO and HICO-DET. Code is available at \url{https://github.com/birlrobotics/vs-gats}.
ROAug 15, 2025
Multi-Group Equivariant Augmentation for Reinforcement Learning in Robot ManipulationHongbin Lin, Juan Rojas, Kwok Wai Samuel Au
Sampling efficiency is critical for deploying visuomotor learning in real-world robotic manipulation. While task symmetry has emerged as a promising inductive bias to improve efficiency, most prior work is limited to isometric symmetries -- applying the same group transformation to all task objects across all timesteps. In this work, we explore non-isometric symmetries, applying multiple independent group transformations across spatial and temporal dimensions to relax these constraints. We introduce a novel formulation of the partially observable Markov decision process (POMDP) that incorporates the non-isometric symmetry structures, and propose a simple yet effective data augmentation method, Multi-Group Equivariance Augmentation (MEA). We integrate MEA with offline reinforcement learning to enhance sampling efficiency, and introduce a voxel-based visual representation that preserves translational equivariance. Extensive simulation and real-robot experiments across two manipulation domains demonstrate the effectiveness of our approach.
ROJun 15, 2021
Towards Safe Control of Continuum Manipulator Using Shielded Multiagent Reinforcement LearningGuanglin Ji, Junyan Yan, Jingxin Du et al.
Continuum robotic manipulators are increasingly adopted in minimal invasive surgery. However, their nonlinear behavior is challenging to model accurately, especially when subject to external interaction, potentially leading to poor control performance. In this letter, we investigate the feasibility of adopting a model-free multiagent reinforcement learning (RL), namely multiagent deep Q network (MADQN), to control a 2-degree of freedom (DoF) cable-driven continuum surgical manipulator. The control of the robot is formulated as a one-DoF, one agent problem in the MADQN framework to improve the learning efficiency. Combined with a shielding scheme that enables dynamic variation of the action set boundary, MADQN leads to efficient and importantly safer control of the robot. Shielded MADQN enabled the robot to perform point and trajectory tracking with submillimeter root mean square errors under external loads, soft obstacles, and rigid collision, which are common interaction scenarios encountered by surgical manipulators. The controller was further proven to be effective in a miniature continuum robot with high structural nonlinearitiy, achieving trajectory tracking with submillimeter accuracy under external payload.
ROMar 24, 2021
Error Identification and Recovery in Robotic Snap AssemblyYusuke Hayami, Weiwei Wan, Keisuke Koyama et al.
Existing methods for predicting robotic snap joint assembly cannot predict failures before their occurrence. To address this limitation, this paper proposes a method for predicting error states before the occurence of error, thereby enabling timely recovery. Robotic snap joint assembly requires precise positioning; therefore, even a slight offset between parts can lead to assembly failure. To correctly predict error states, we apply functional principal component analysis (fPCA) to 6D force/torque profiles that are terminated before the occurence of an error. The error state is identified by applying a feature vector to a decision tree, wherein the support vector machine (SVM) is employed at each node. If the estimation accuracy is low, we perform additional probing to more correctly identify the error state. Finally, after identifying the error state, a robot performs the error recovery motion based on the identified error state. Through the experimental results of assembling plastic parts with four snap joints, we show that the error states can be correctly estimated and a robot can recover from the identified error state.
ROOct 16, 2020
Hyperparameter Auto-tuning in Self-Supervised Robotic LearningJiancong Huang, Juan Rojas, Matthieu Zimmer et al.
Policy optimization in reinforcement learning requires the selection of numerous hyperparameters across different environments. Fixing them incorrectly may negatively impact optimization performance leading notably to insufficient or redundant learning. Insufficient learning (due to convergence to local optima) results in under-performing policies whilst redundant learning wastes time and resources. The effects are further exacerbated when using single policies to solve multi-task learning problems. Observing that the Evidence Lower Bound (ELBO) used in Variational Auto-Encoders correlates with the diversity of image samples, we propose an auto-tuning technique based on the ELBO for self-supervised reinforcement learning. Our approach can auto-tune three hyperparameters: the replay buffer size, the number of policy gradient updates during each epoch, and the number of exploration steps during each epoch. We use a state-of-the-art self-supervised robot learning framework (Reinforcement Learning with Imagined Goals (RIG) using Soft Actor-Critic) as baseline for experimental verification. Experiments show that our method can auto-tune online and yields the best performance at a fraction of the time and computational resources. Code, video, and appendix for simulated and real-robot experiments can be found at the project page \url{www.JuanRojas.net/autotune}.
CVMar 11, 2020
A Graph Attention Spatio-temporal Convolutional Network for 3D Human Pose Estimation in VideoJunfa Liu, Juan Rojas, Zhijun Liang et al.
Spatio-temporal information is key to resolve occlusion and depth ambiguity in 3D pose estimation. Previous methods have focused on either temporal contexts or local-to-global architectures that embed fixed-length spatio-temporal information. To date, there have not been effective proposals to simultaneously and flexibly capture varying spatio-temporal sequences and effectively achieves real-time 3D pose estimation. In this work, we improve the learning of kinematic constraints in the human skeleton: posture, local kinematic connections, and symmetry by modeling local and global spatial information via attention mechanisms. To adapt to single- and multi-frame estimation, the dilated temporal model is employed to process varying skeleton sequences. Also, importantly, we carefully design the interleaving of spatial semantics with temporal dependencies to achieve a synergistic effect. To this end, we propose a simple yet effective graph attention spatio-temporal convolutional network (GAST-Net) that comprises of interleaved temporal convolutional and graph attention blocks. Experiments on two challenging benchmark datasets (Human3.6M and HumanEva-I) and YouTube videos demonstrate that our approach effectively mitigates depth ambiguity and self-occlusion, generalizes to half upper body estimation, and achieves competitive performance on 2D-to-3D video pose estimation. Code, video, and supplementary information is available at: \href{http://www.juanrojas.net/gast/}{http://www.juanrojas.net/gast/}
AIOct 19, 2019
Towards More Sample Efficiency in Reinforcement Learning with Data AugmentationYijiong Lin, Jiancong Huang, Matthieu Zimmer et al.
Deep reinforcement learning (DRL) is a promising approach for adaptive robot control, but its current application to robotics is currently hindered by high sample requirements. We propose two novel data augmentation techniques for DRL in order to reuse more efficiently observed data. The first one called Kaleidoscope Experience Replay exploits reflectional symmetries, while the second called Goal-augmented Experience Replay takes advantage of lax goal definitions. Our preliminary experimental results show a large increase in learning speed.
ROSep 24, 2019
Invariant Transform Experience Replay: Data Augmentation for Deep Reinforcement LearningYijiong Lin, Jiancong Huang, Matthieu Zimmer et al.
Deep Reinforcement Learning (RL) is a promising approach for adaptive robot control, but its current application to robotics is currently hindered by high sample requirements. To alleviate this issue, we propose to exploit the symmetries present in robotic tasks. Intuitively, symmetries from observed trajectories define transformations that leave the space of feasible RL trajectories invariant and can be used to generate new feasible trajectories, which could be used for training. Based on this data augmentation idea, we formulate a general framework, called Invariant Transform Experience Replay that we present with two techniques: (i) Kaleidoscope Experience Replay exploits reflectional symmetries and (ii) Goal-augmented Experience Replay which takes advantage of lax goal definitions. In the Fetch tasks from OpenAI Gym, our experimental results show significant increases in learning rates and success rates. Particularly, we attain a 13, 3, and 5 times speedup in the pushing, sliding, and pick-and-place tasks respectively in the multi-goal setting. Performance gains are also observed in similar tasks with obstacles and we successfully deployed a trained policy on a real Baxter robot. Our work demonstrates that invariant transformations on RL trajectories are a promising methodology to speed up learning in deep RL.
ROSep 12, 2018
Dynamic Interaction Probabilistic Movement PrimitivesShuangda Duan, Longxin Chen, Hongmin Wu et al.
Human-robot collaboration is on the rise. Robots need to increasingly improve the efficiency and smoothness with which they assist humans by properly anticipating a human's intention. To do so, prediction models need to increase their accuracy and responsiveness. This work builds on top of Interaction Movement Primitives with phase estimation and re-formulates the framework to use dynamic human-motion observations which constantly update anticipatory motions. The original framework only considers a single fixed-duration static human observation which is used to perform only one anticipatory motion. Dynamic observations, with built-in phase estimation, yield a series of updated robot motion distributions. Co-activation is performed between the existing and newest most probably robot motion distribution. This results in smooth anticipatory robot motions that are highly accurate and with enhanced responsiveness.
ROSep 11, 2018
Endowing Robots with Longer-term Autonomy by Recovering from External Disturbances in Manipulation through Grounded Anomaly Classification and Recovery PoliciesHongmin Wu, Shuangqi Luo, Longxin Chen et al.
Robot manipulation is increasingly poised to interact with humans in co-shared workspaces. Despite increasingly robust manipulation and control algorithms, failure modes continue to exist whenever models do not capture the dynamics of the unstructured environment. To obtain longer-term horizons in robot automation, robots must develop introspection and recovery abilities. We contribute a set of recovery policies to deal with anomalies produced by external disturbances as well as anomaly classification through the use of non-parametric statistics with memoized variational inference with scalable adaptation. A recovery critic stands atop of a tightly-integrated, graph-based online motion-generation and introspection system that resolves a wide range of anomalous situations. Policies, skills, and introspection models are learned incrementally and contextually in a task. Two task-level recovery policies: re-enactment and adaptation resolve accidental and persistent anomalies respectively. The introspection system uses non-parametric priors along with Markov jump linear systems and memoized variational inference with scalable adaptation to learn a model from the data. Extensive real-robot experimentation with various strenuous anomalous conditions is induced and resolved at different phases of a task and in different combinations. The system executes around-the-clock introspection and recovery and even elicited self-recovery when misclassifications occurred.
ROSep 4, 2018
Plastic Waste is Exponentially Filling our Oceans, but where are the Robots?Juan Rojas
Plastic waste is filling our oceans at an exponential rate. The situation is catastrophic and has now garnered worldwide attention. Despite the catastrophic conditions, little to no robotics research is conducted in the identification, collection, sorting, and removal of plastic waste from oceans and rivers and at the macro- and micro-scale. Only a scarce amount of individual efforts can be found from private sources. This paper presents a cursory view of the current plastic water waste catastrophe, associated robot research, and other efforts currently underway to address the issue. As well as the call that as a community, we must wait no longer to address the problem. Surely there is much potential for robots to help meet the challenges posed by the enormity of this problem.
ROSep 24, 2017
Fast, Robust, and Versatile Event Detection through HMM Belief State Gradient MeasuresShuangqi Luo, Hongmin Wu, Hongbin Lin et al.
Event detection is a critical feature in data-driven systems as it assists with the identification of nominal and anomalous behavior. Event detection is increasingly relevant in robotics as robots operate with greater autonomy in increasingly unstructured environments. In this work, we present an accurate, robust, fast, and versatile measure for skill and anomaly identification. A theoretical proof establishes the link between the derivative of the log-likelihood of the HMM filtered belief state and the latest emission probabilities. The key insight is the inverse relationship in which gradient analysis is used for skill and anomaly identification. Our measure showed better performance across all metrics than related state-of-the art works. The result is broadly applicable to domains that use HMMs for event detection.
ROAug 8, 2017
Learning Human-Robot Collaboration Insights through the Integration of Muscle Activity in Interaction Motion ModelsLongxin Chen, Juan Rojas, Shuangda Duan et al.
Recent progress in human-robot collaboration makes fast and fluid interactions possible, even when human observations are partial and occluded. Methods like Interaction Probabilistic Movement Primitives (ProMP) model human trajectories through motion capture systems. However, such representation does not properly model tasks where similar motions handle different objects. Under current approaches, a robot would not adapt its pose and dynamics for proper handling. We integrate the use of Electromyography (EMG) into the Interaction ProMP framework and utilize muscular signals to augment the human observation representation. The contribution of our paper is increased task discernment when trajectories are similar but tools are different and require the robot to adjust its pose for proper handling. Interaction ProMPs are used with an augmented vector that integrates muscle activity. Augmented time-normalized trajectories are used in training to learn correlation parameters and robot motions are predicted by finding the best weight combination and temporal scaling for a task. Collaborative single task scenarios with similar motions but different objects were used and compared. For one experiment only joint angles were recorded, for the other EMG signals were additionally integrated. Task recognition was computed for both tasks. Observation state vectors with augmented EMG signals were able to completely identify differences across tasks, while the baseline method failed every time. Integrating EMG signals into collaborative tasks significantly increases the ability of the system to recognize nuances in the tasks that are otherwise imperceptible, up to 74.6% in our studies. Furthermore, the integration of EMG signals for collaboration also opens the door to a wide class of human-robot physical interactions based on haptic communication that has been largely unexploited in the field.
ROAug 1, 2017
Recovering from External Disturbances in Online Manipulation through State-Dependent Revertive Recovery PoliciesHongmin Wu, Hongbin Lin, Shuangqi Luo et al.
Robots are increasingly entering uncertain and unstructured environments. Within these, robots are bound to face unexpected external disturbances like accidental human or tool collisions. Robots must develop the capacity to respond to unexpected events. That is not only identifying the sudden anomaly, but also deciding how to handle it. In this work, we contribute a recovery policy that allows a robot to recovery from various anomalous scenarios across different tasks and conditions in a consistent and robust fashion. The system organizes tasks as a sequence of nodes composed of internal modules such as motion generation and introspection. When an introspection module flags an anomaly, the recovery strategy is triggered and reverts the task execution by selecting a target node as a function of a state dependency chart. The new skill allows the robot to overcome the effects of the external disturbance and conclude the task. Our system recovers from accidental human and tool collisions in a number of tasks. Of particular importance is the fact that we test the robustness of the recovery system by triggering anomalies at each node in the task graph showing robust recovery everywhere in the task. We also trigger multiple and repeated anomalies at each of the nodes of the task showing that the recovery system can consistently recover anywhere in the presence of strong and pervasive anomalous conditions. Robust recovery systems will be key enablers for long-term autonomy in robot systems. Supplemental info including code, data, graphs, and result analysis can be found at [1].
ROMay 24, 2017
Robot Introspection with Bayesian Nonparametric Vector Autoregressive Hidden Markov ModelsHongmin Wu, Hongbin Lin, Yisheng Guan et al.
Robot introspection, as opposed to anomaly detection typical in process monitoring, helps a robot understand what it is doing at all times. A robot should be able to identify its actions not only when failure or novelty occurs, but also as it executes any number of sub-tasks. As robots continue their quest of functioning in unstructured environments, it is imperative they understand what is it that they are actually doing to render them more robust. This work investigates the modeling ability of Bayesian nonparametric techniques on Markov Switching Process to learn complex dynamics typical in robot contact tasks. We study whether the Markov switching process, together with Bayesian priors can outperform the modeling ability of its counterparts: an HMM with Bayesian priors and without. The work was tested in a snap assembly task characterized by high elastic forces. The task consists of an insertion subtask with very complex dynamics. Our approach showed a stronger ability to generalize and was able to better model the subtask with complex dynamics in a computationally efficient way. The modeling technique is also used to learn a growing library of robot skills, one that when integrated with low-level control allows for robot online decision making.
ROMar 11, 2017
A Vision-based Scheme for Kinematic Model Construction of Re-configurable Modular RobotsKewei Lin, Juan Rojas, Yisheng Guan
Re-configurable modular robotic (RMR) systems are advantageous for their reconfigurability and versatility. A new modular robot can be built for a specific task by using modules as building blocks. However, constructing a kinematic model for a newly conceived robot requires significant work. Due to the finite size of module-types, models of all module-types can be built individually and stored in a database beforehand. With this priori knowledge, the model construction process can be automated by detecting the modules and their corresponding interconnections. Previous literature proposed theoretical frameworks for constructing kinematic models of modular robots, assuming that such information was known a priori. While well-devised mechanisms and built-in sensors can be employed to detect these parameters automatically, they significantly complicate the module design and thus are expensive. In this paper, we propose a vision-based method to identify kinematic chains and automatically construct robot models for modular robots. Each module is affixed with augmented reality (AR) tags that are encoded with unique IDs. An image of a modular robot is taken and the detected modules are recognized by querying a database that maintains all module information. The poses of detected modules are used to compute: (i) the connection between modules and (ii) joint angles of joint-modules. Finally, the robot serial-link chain is identified and the kinematic model constructed and visualized. Our experimental results validate the effectiveness of our approach. While implementation with only our RMR is shown, our method can be applied to other RMRs where self-identification is not possible.
ROMar 11, 2017
A 3D Object Detection and Pose Estimation Pipeline Using RGB-D ImagesRuotao He, Juan Rojas, Yisheng Guan
3D object detection and pose estimation has been studied extensively in recent decades for its potential applications in robotics. However, there still remains challenges when we aim at detecting multiple objects while retaining low false positive rate in cluttered environments. This paper proposes a robust 3D object detection and pose estimation pipeline based on RGB-D images, which can detect multiple objects simultaneously while reducing false positives. Detection begins with template matching and yields a set of template matches. A clustering algorithm then groups templates of similar spatial location and produces multiple-object hypotheses. A scoring function evaluates the hypotheses using their associated templates and non-maximum suppression is adopted to remove duplicate results based on the scores. Finally, a combination of point cloud processing algorithms are used to compute objects' 3D poses. Existing object hypotheses are verified by computing the overlap between model and scene points. Experiments demonstrate that our approach provides competitive results comparable to the state-of-the-arts and can be applied to robot random bin-picking.
ROFeb 28, 2017
Online Robot Introspection via Wrench-based Action GrammarsJuan Rojas, Shuangqi Luo, Dingqiao Zhu et al.
Robotic failure is all too common in unstructured robot tasks. Despite well-designed controllers, robots often fail due to unexpected events. How do robots measure unexpected events? Many do not. Most robots are driven by the sense-plan act paradigm, however more recently robots are undergoing a sense-plan-act-verify paradigm. In this work, we present a principled methodology to bootstrap online robot introspection for contact tasks. In effect, we are trying to enable the robot to answer the question: what did I do? Is my behavior as expected or not? To this end, we analyze noisy wrench data and postulate that the latter inherently contains patterns that can be effectively represented by a vocabulary. The vocabulary is generated by segmenting and encoding the data. When the wrench information represents a sequence of sub-tasks, we can think of the vocabulary forming a sentence (set of words with grammar rules) for a given sub-task; allowing the latter to be uniquely represented. The grammar, which can also include unexpected events, was classified in offline and online scenarios as well as for simulated and real robot experiments. Multiclass Support Vector Machines (SVMs) were used offline, while online probabilistic SVMs were are used to give temporal confidence to the introspection result. The contribution of our work is the presentation of a generalizable online semantic scheme that enables a robot to understand its high-level state whether nominal or abnormal. It is shown to work in offline and online scenarios for a particularly challenging contact task: snap assemblies. We perform the snap assembly in one-arm simulated and real one-arm experiments and a simulated two-arm experiment. This verification mechanism can be used by high-level planners or reasoning systems to enable intelligent failure recovery or determine the next most optima manipulation skill to be used.
ROSep 16, 2016
Robot Introspection via Wrench-based Action GrammarsJuan Rojas, Zhengjie Huang, Shuangqi Luo et al.
Robotic failure is all too common in unstructured robot tasks. Despite well designed controllers, robots often fail due to unexpected events. How do robots measure unexpected events? Many do not. Most robots are driven by the senseplan- act paradigm, however more recently robots are working with a sense-plan-act-verify paradigm. In this work we present a principled methodology to bootstrap robot introspection for contact tasks. In effect, we are trying to answer the question, what did the robot do? To this end, we hypothesize that all noisy wrench data inherently contains patterns that can be effectively represented by a vocabulary. The vocabulary is generated by meaningfully segmenting the data and then encoding it. When the wrench information represents a sequence of sub-tasks, we can think of the vocabulary forming sets of words or sentences, such that each subtask is uniquely represented by a word set. Such sets can be classified using statistical or machine learning techniques. We use SVMs and Mondrian Forests to classify contacts tasks both in simulation and in real robots for one and dual arm scenarios showing the general robustness of the approach. The contribution of our work is the presentation of a simple but generalizable semantic scheme that enables a robot to understand its high level state. This verification mechanism can provide feedback for high-level planners or reasoning systems that use semantic descriptors as well. The code, data, and other supporting documentation can be found at: http://www.juanrojas.net/2017icra_wrench_introspection.
ROSep 16, 2016
Robot Contact Task State Estimation via Action GrammarsJuan Rojas, Zhengjie Huang, Kensuke Harada
Uncertainty is a major difficulty in endowing robots with autonomy. Robots often fail due to unexpected events. In robot contact tasks are often design to empirically look for force thresholds to define state transitions in a Markov chain or finite state machines. Such design is prone to failure in unstructured environments, when due to external disturbances or erroneous models, such thresholds are met, and lead to state transitions that are false-positives. The focus of this paper is to perform high-level state estimation of robot behaviors and task output for robot contact tasks. Our approach encodes raw low-level 3D cartesian trajectories and converts them into a high level (HL) action grammars. Cartesian trajectories can be segmented and encoded in a way that their dynamic properties, or "texture" are preserved. Once an action grammar is generated, a classifier is trained to detect current behaviors and ultimately the task output. The system executed HL state estimation for task output verification with an accuracy of 86%, and behavior monitoring with an average accuracy of: 72%. The significance of the work is the transformation of difficult-to-use raw low-level data to HL data that enables robust behavior and task monitoring. Monitoring is useful for failure correction or other deliberation in high-level planning, programming by demonstration, and human-robot interaction to name a few.