Yisheng Guan

RO
17papers
122citations
Novelty47%
AI Score28

17 Papers

CVJul 20, 2022
Perspective Phase Angle Model for Polarimetric 3D Reconstruction

Guangcheng Chen, Li He, Yisheng Guan et al.

Current polarimetric 3D reconstruction methods, including those in the well-established shape from polarization literature, are all developed under the orthographic projection assumption. In the case of a large field of view, however, this assumption does not hold and may result in significant reconstruction errors in methods that make this assumption. To address this problem, we present the perspective phase angle (PPA) model that is applicable to perspective cameras. Compared with the orthographic model, the proposed PPA model accurately describes the relationship between polarization phase angle and surface normal under perspective projection. In addition, the PPA model makes it possible to estimate surface normals from only one single-view phase angle map and does not suffer from the so-called $π$-ambiguity problem. Experiments on real data show that the PPA model is more accurate for surface normal estimation with a perspective camera than the orthographic model.

CVApr 15, 2022
Condition-Invariant and Compact Visual Place Description by Convolutional Autoencoder

Hanjing Ye, Weinan Chen, Jingwen Yu et al.

Visual place recognition (VPR) in condition-varying environments is still an open problem. Popular solutions are CNN-based image descriptors, which have been shown to outperform traditional image descriptors based on hand-crafted visual features. However, there are two drawbacks of current CNN-based descriptors: a) their high dimension and b) lack of generalization, leading to low efficiency and poor performance in applications. In this paper, we propose to use a convolutional autoencoder (CAE) to tackle this problem. We employ a high-level layer of a pre-trained CNN to generate features, and train a CAE to map the features to a low-dimensional space to improve the condition invariance property of the descriptor and reduce its dimension at the same time. We verify our method in three challenging datasets involving significant illumination changes, and our method is shown to be superior to the state-of-the-art. For the benefit of the community, we make public the source code.

CVAug 5, 2020Code
Pose-based Modular Network for Human-Object Interaction Detection

Zhijun Liang, Junfa Liu, Yisheng Guan et al.

Human-object interaction(HOI) detection is a critical task in scene understanding. The goal is to infer the triplet <subject, predicate, object> in a scene. In this work, we note that the human pose itself as well as the relative spatial information of the human pose with respect to the target object can provide informative cues for HOI detection. We contribute a Pose-based Modular Network (PMN) which explores the absolute pose features and relative spatial pose features to improve HOI detection and is fully compatible with existing networks. Our module consists of a branch that first processes the relative spatial pose features of each joint independently. Another branch updates the absolute pose features via fully connected graph structures. The processed pose features are then fed into an action classifier. To evaluate our proposed method, we combine the module with the state-of-the-art model named VS-GATs and obtain significant improvement on two public benchmarks: V-COCO and HICO-DET, which shows its efficacy and flexibility. Code is available at \url{https://github.com/birlrobotics/PMN}.

LGJan 7, 2020Code
Visual-Semantic Graph Attention Networks for Human-Object Interaction Detection

Zhijun Liang, Juan Rojas, Junfa Liu et al.

In scene understanding, robotics benefit from not only detecting individual scene instances but also from learning their possible interactions. Human-Object Interaction (HOI) Detection infers the action predicate on a <human, predicate, object> triplet. Contextual information has been found critical in inferring interactions. However, most works only use local features from single human-object pair for inference. Few works have studied the disambiguating contribution of subsidiary relations made available via graph networks. Similarly, few have learned to effectively leverage visual cues along with the intrinsic semantic regularities contained in HOIs. We contribute a dual-graph attention network that effectively aggregates contextual visual, spatial, and semantic information dynamically from primary human-object relations as well as subsidiary relations through attention mechanisms for strong disambiguating power. We achieve comparable results on two benchmarks: V-COCO and HICO-DET. Code is available at \url{https://github.com/birlrobotics/vs-gats}.

CVJun 17, 2024
SWCF-Net: Similarity-weighted Convolution and Local-global Fusion for Efficient Large-scale Point Cloud Semantic Segmentation

Zhenchao Lin, Li He, Hongqiang Yang et al.

Large-scale point cloud consists of a multitude of individual objects, thereby encompassing rich structural and underlying semantic contextual information, resulting in a challenging problem in efficiently segmenting a point cloud. Most existing researches mainly focus on capturing intricate local features without giving due consideration to global ones, thus failing to leverage semantic context. In this paper, we propose a Similarity-Weighted Convolution and local-global Fusion Network, named SWCF-Net, which takes into account both local and global features. We propose a Similarity-Weighted Convolution (SWConv) to effectively extract local features, where similarity weights are incorporated into the convolution operation to enhance the generalization capabilities. Then, we employ a downsampling operation on the K and V channels within the attention module, thereby reducing the quadratic complexity to linear, enabling the Transformer to deal with large-scale point clouds. At last, orthogonal components are extracted in the global features and then aggregated with local features, thereby eliminating redundant information between local and global features and consequently promoting efficiency. We evaluate SWCF-Net on large-scale outdoor datasets SemanticKITTI and Toronto3D. Our experimental results demonstrate the effectiveness of the proposed network. Our method achieves a competitive result with less computational cost, and is able to handle large-scale point clouds efficiently.

ROOct 16, 2020
Hyperparameter Auto-tuning in Self-Supervised Robotic Learning

Jiancong Huang, Juan Rojas, Matthieu Zimmer et al.

Policy optimization in reinforcement learning requires the selection of numerous hyperparameters across different environments. Fixing them incorrectly may negatively impact optimization performance leading notably to insufficient or redundant learning. Insufficient learning (due to convergence to local optima) results in under-performing policies whilst redundant learning wastes time and resources. The effects are further exacerbated when using single policies to solve multi-task learning problems. Observing that the Evidence Lower Bound (ELBO) used in Variational Auto-Encoders correlates with the diversity of image samples, we propose an auto-tuning technique based on the ELBO for self-supervised reinforcement learning. Our approach can auto-tune three hyperparameters: the replay buffer size, the number of policy gradient updates during each epoch, and the number of exploration steps during each epoch. We use a state-of-the-art self-supervised robot learning framework (Reinforcement Learning with Imagined Goals (RIG) using Soft Actor-Critic) as baseline for experimental verification. Experiments show that our method can auto-tune online and yields the best performance at a fraction of the time and computational resources. Code, video, and appendix for simulated and real-robot experiments can be found at the project page \url{www.JuanRojas.net/autotune}.

CVMar 11, 2020
A Graph Attention Spatio-temporal Convolutional Network for 3D Human Pose Estimation in Video

Junfa Liu, Juan Rojas, Zhijun Liang et al.

Spatio-temporal information is key to resolve occlusion and depth ambiguity in 3D pose estimation. Previous methods have focused on either temporal contexts or local-to-global architectures that embed fixed-length spatio-temporal information. To date, there have not been effective proposals to simultaneously and flexibly capture varying spatio-temporal sequences and effectively achieves real-time 3D pose estimation. In this work, we improve the learning of kinematic constraints in the human skeleton: posture, local kinematic connections, and symmetry by modeling local and global spatial information via attention mechanisms. To adapt to single- and multi-frame estimation, the dilated temporal model is employed to process varying skeleton sequences. Also, importantly, we carefully design the interleaving of spatial semantics with temporal dependencies to achieve a synergistic effect. To this end, we propose a simple yet effective graph attention spatio-temporal convolutional network (GAST-Net) that comprises of interleaved temporal convolutional and graph attention blocks. Experiments on two challenging benchmark datasets (Human3.6M and HumanEva-I) and YouTube videos demonstrate that our approach effectively mitigates depth ambiguity and self-occlusion, generalizes to half upper body estimation, and achieves competitive performance on 2D-to-3D video pose estimation. Code, video, and supplementary information is available at: \href{http://www.juanrojas.net/gast/}{http://www.juanrojas.net/gast/}

CVDec 17, 2019
Large-scale Multi-modal Person Identification in Real Unconstrained Environments

Jiajie Ye, Yisheng Guan, Junfa Liu et al.

Person identification (P-ID) under real unconstrained noisy environments is a huge challenge. In multiple-feature learning with Deep Convolutional Neural Networks (DCNNs) or Machine Learning method for large-scale person identification in the wild, the key is to design an appropriate strategy for decision layer fusion or feature layer fusion which can enhance discriminative power. It is necessary to extract different types of valid features and establish a reasonable framework to fuse different types of information. In traditional methods, different persons are identified based on single modal features to identify, such as face feature, audio feature, and head feature. These traditional methods cannot realize a highly accurate level of person identification in real unconstrained environments. The study aims to propose a fusion module to fuse multi-modal features for person identification in real unconstrained environments.

ROSep 24, 2019
Invariant Transform Experience Replay: Data Augmentation for Deep Reinforcement Learning

Yijiong Lin, Jiancong Huang, Matthieu Zimmer et al.

Deep Reinforcement Learning (RL) is a promising approach for adaptive robot control, but its current application to robotics is currently hindered by high sample requirements. To alleviate this issue, we propose to exploit the symmetries present in robotic tasks. Intuitively, symmetries from observed trajectories define transformations that leave the space of feasible RL trajectories invariant and can be used to generate new feasible trajectories, which could be used for training. Based on this data augmentation idea, we formulate a general framework, called Invariant Transform Experience Replay that we present with two techniques: (i) Kaleidoscope Experience Replay exploits reflectional symmetries and (ii) Goal-augmented Experience Replay which takes advantage of lax goal definitions. In the Fetch tasks from OpenAI Gym, our experimental results show significant increases in learning rates and success rates. Particularly, we attain a 13, 3, and 5 times speedup in the pushing, sliding, and pick-and-place tasks respectively in the multi-goal setting. Performance gains are also observed in similar tasks with obstacles and we successfully deployed a trained policy on a real Baxter robot. Our work demonstrates that invariant transformations on RL trajectories are a promising methodology to speed up learning in deep RL.

ROSep 11, 2018
Endowing Robots with Longer-term Autonomy by Recovering from External Disturbances in Manipulation through Grounded Anomaly Classification and Recovery Policies

Hongmin Wu, Shuangqi Luo, Longxin Chen et al.

Robot manipulation is increasingly poised to interact with humans in co-shared workspaces. Despite increasingly robust manipulation and control algorithms, failure modes continue to exist whenever models do not capture the dynamics of the unstructured environment. To obtain longer-term horizons in robot automation, robots must develop introspection and recovery abilities. We contribute a set of recovery policies to deal with anomalies produced by external disturbances as well as anomaly classification through the use of non-parametric statistics with memoized variational inference with scalable adaptation. A recovery critic stands atop of a tightly-integrated, graph-based online motion-generation and introspection system that resolves a wide range of anomalous situations. Policies, skills, and introspection models are learned incrementally and contextually in a task. Two task-level recovery policies: re-enactment and adaptation resolve accidental and persistent anomalies respectively. The introspection system uses non-parametric priors along with Markov jump linear systems and memoized variational inference with scalable adaptation to learn a model from the data. Extensive real-robot experimentation with various strenuous anomalous conditions is induced and resolved at different phases of a task and in different combinations. The system executes around-the-clock introspection and recovery and even elicited self-recovery when misclassifications occurred.

ROJul 3, 2018
Submap-based Pose-graph Visual SLAM: A Robust Visual Exploration and Localization System

Weinan Chen, Lei Zhu, Yisheng Guan et al.

For VSLAM (Visual Simultaneous Localization and Mapping), localization is a challenging task, especially for some challenging situations: textureless frames, motion blur, etc.. To build a robust exploration and localization system in a given space or environment, a submap-based VSLAM system is proposed in this paper. Our system uses a submap back-end and a visual front-end. The main advantage of our system is its robustness with respect to tracking failure, a common problem in current VSLAM algorithms. The robustness of our system is compared with the state-of-the-art in terms of average tracking percentage. The precision of our system is also evaluated in terms of ATE (absolute trajectory error) RMSE (root mean square error) comparing the state-of-the-art. The ability of our system in solving the `kidnapped' problem is demonstrated. Our system can improve the robustness of visual localization in challenging situations.

ROSep 24, 2017
Fast, Robust, and Versatile Event Detection through HMM Belief State Gradient Measures

Shuangqi Luo, Hongmin Wu, Hongbin Lin et al.

Event detection is a critical feature in data-driven systems as it assists with the identification of nominal and anomalous behavior. Event detection is increasingly relevant in robotics as robots operate with greater autonomy in increasingly unstructured environments. In this work, we present an accurate, robust, fast, and versatile measure for skill and anomaly identification. A theoretical proof establishes the link between the derivative of the log-likelihood of the HMM filtered belief state and the latest emission probabilities. The key insight is the inverse relationship in which gradient analysis is used for skill and anomaly identification. Our measure showed better performance across all metrics than related state-of-the art works. The result is broadly applicable to domains that use HMMs for event detection.

ROAug 8, 2017
Learning Human-Robot Collaboration Insights through the Integration of Muscle Activity in Interaction Motion Models

Longxin Chen, Juan Rojas, Shuangda Duan et al.

Recent progress in human-robot collaboration makes fast and fluid interactions possible, even when human observations are partial and occluded. Methods like Interaction Probabilistic Movement Primitives (ProMP) model human trajectories through motion capture systems. However, such representation does not properly model tasks where similar motions handle different objects. Under current approaches, a robot would not adapt its pose and dynamics for proper handling. We integrate the use of Electromyography (EMG) into the Interaction ProMP framework and utilize muscular signals to augment the human observation representation. The contribution of our paper is increased task discernment when trajectories are similar but tools are different and require the robot to adjust its pose for proper handling. Interaction ProMPs are used with an augmented vector that integrates muscle activity. Augmented time-normalized trajectories are used in training to learn correlation parameters and robot motions are predicted by finding the best weight combination and temporal scaling for a task. Collaborative single task scenarios with similar motions but different objects were used and compared. For one experiment only joint angles were recorded, for the other EMG signals were additionally integrated. Task recognition was computed for both tasks. Observation state vectors with augmented EMG signals were able to completely identify differences across tasks, while the baseline method failed every time. Integrating EMG signals into collaborative tasks significantly increases the ability of the system to recognize nuances in the tasks that are otherwise imperceptible, up to 74.6% in our studies. Furthermore, the integration of EMG signals for collaboration also opens the door to a wide class of human-robot physical interactions based on haptic communication that has been largely unexploited in the field.

ROAug 1, 2017
Recovering from External Disturbances in Online Manipulation through State-Dependent Revertive Recovery Policies

Hongmin Wu, Hongbin Lin, Shuangqi Luo et al.

Robots are increasingly entering uncertain and unstructured environments. Within these, robots are bound to face unexpected external disturbances like accidental human or tool collisions. Robots must develop the capacity to respond to unexpected events. That is not only identifying the sudden anomaly, but also deciding how to handle it. In this work, we contribute a recovery policy that allows a robot to recovery from various anomalous scenarios across different tasks and conditions in a consistent and robust fashion. The system organizes tasks as a sequence of nodes composed of internal modules such as motion generation and introspection. When an introspection module flags an anomaly, the recovery strategy is triggered and reverts the task execution by selecting a target node as a function of a state dependency chart. The new skill allows the robot to overcome the effects of the external disturbance and conclude the task. Our system recovers from accidental human and tool collisions in a number of tasks. Of particular importance is the fact that we test the robustness of the recovery system by triggering anomalies at each node in the task graph showing robust recovery everywhere in the task. We also trigger multiple and repeated anomalies at each of the nodes of the task showing that the recovery system can consistently recover anywhere in the presence of strong and pervasive anomalous conditions. Robust recovery systems will be key enablers for long-term autonomy in robot systems. Supplemental info including code, data, graphs, and result analysis can be found at [1].

ROMay 24, 2017
Robot Introspection with Bayesian Nonparametric Vector Autoregressive Hidden Markov Models

Hongmin Wu, Hongbin Lin, Yisheng Guan et al.

Robot introspection, as opposed to anomaly detection typical in process monitoring, helps a robot understand what it is doing at all times. A robot should be able to identify its actions not only when failure or novelty occurs, but also as it executes any number of sub-tasks. As robots continue their quest of functioning in unstructured environments, it is imperative they understand what is it that they are actually doing to render them more robust. This work investigates the modeling ability of Bayesian nonparametric techniques on Markov Switching Process to learn complex dynamics typical in robot contact tasks. We study whether the Markov switching process, together with Bayesian priors can outperform the modeling ability of its counterparts: an HMM with Bayesian priors and without. The work was tested in a snap assembly task characterized by high elastic forces. The task consists of an insertion subtask with very complex dynamics. Our approach showed a stronger ability to generalize and was able to better model the subtask with complex dynamics in a computationally efficient way. The modeling technique is also used to learn a growing library of robot skills, one that when integrated with low-level control allows for robot online decision making.

ROMar 11, 2017
A Vision-based Scheme for Kinematic Model Construction of Re-configurable Modular Robots

Kewei Lin, Juan Rojas, Yisheng Guan

Re-configurable modular robotic (RMR) systems are advantageous for their reconfigurability and versatility. A new modular robot can be built for a specific task by using modules as building blocks. However, constructing a kinematic model for a newly conceived robot requires significant work. Due to the finite size of module-types, models of all module-types can be built individually and stored in a database beforehand. With this priori knowledge, the model construction process can be automated by detecting the modules and their corresponding interconnections. Previous literature proposed theoretical frameworks for constructing kinematic models of modular robots, assuming that such information was known a priori. While well-devised mechanisms and built-in sensors can be employed to detect these parameters automatically, they significantly complicate the module design and thus are expensive. In this paper, we propose a vision-based method to identify kinematic chains and automatically construct robot models for modular robots. Each module is affixed with augmented reality (AR) tags that are encoded with unique IDs. An image of a modular robot is taken and the detected modules are recognized by querying a database that maintains all module information. The poses of detected modules are used to compute: (i) the connection between modules and (ii) joint angles of joint-modules. Finally, the robot serial-link chain is identified and the kinematic model constructed and visualized. Our experimental results validate the effectiveness of our approach. While implementation with only our RMR is shown, our method can be applied to other RMRs where self-identification is not possible.

ROMar 11, 2017
A 3D Object Detection and Pose Estimation Pipeline Using RGB-D Images

Ruotao He, Juan Rojas, Yisheng Guan

3D object detection and pose estimation has been studied extensively in recent decades for its potential applications in robotics. However, there still remains challenges when we aim at detecting multiple objects while retaining low false positive rate in cluttered environments. This paper proposes a robust 3D object detection and pose estimation pipeline based on RGB-D images, which can detect multiple objects simultaneously while reducing false positives. Detection begins with template matching and yields a set of template matches. A clustering algorithm then groups templates of similar spatial location and produces multiple-object hypotheses. A scoring function evaluates the hypotheses using their associated templates and non-maximum suppression is adopted to remove duplicate results based on the scores. Finally, a combination of point cloud processing algorithms are used to compute objects' 3D poses. Existing object hypotheses are verified by computing the overlap between model and scene points. Experiments demonstrate that our approach provides competitive results comparable to the state-of-the-arts and can be applied to robot random bin-picking.