LGOct 22, 2023
Robust Visual Imitation Learning with Inverse Dynamics RepresentationsSiyuan Li, Xun Wang, Rongchang Zuo et al.
Imitation learning (IL) has achieved considerable success in solving complex sequential decision-making problems. However, current IL methods mainly assume that the environment for learning policies is the same as the environment for collecting expert datasets. Therefore, these methods may fail to work when there are slight differences between the learning and expert environments, especially for challenging problems with high-dimensional image observations. However, in real-world scenarios, it is rare to have the chance to collect expert trajectories precisely in the target learning environment. To address this challenge, we propose a novel robust imitation learning approach, where we develop an inverse dynamics state representation learning objective to align the expert environment and the learning environment. With the abstract state representation, we design an effective reward function, which thoroughly measures the similarity between behavior data and expert data not only element-wise, but also from the trajectory level. We conduct extensive experiments to evaluate the proposed approach under various visual perturbations and in diverse visual control tasks. Our approach can achieve a near-expert performance in most environments, and significantly outperforms the state-of-the-art visual IL methods and robust IL methods.
ROApr 29, 2021Code
REGRAD: A Large-Scale Relational Grasp Dataset for Safe and Object-Specific Robotic Grasping in ClutterHanbo Zhang, Deyu Yang, Han Wang et al.
Despite the impressive progress achieved in robotic grasping, robots are not skilled in sophisticated tasks (e.g. search and grasp a specified target in clutter). Such tasks involve not only grasping but the comprehensive perception of the world (e.g. the object relationships). Recently, encouraging results demonstrate that it is possible to understand high-level concepts by learning. However, such algorithms are usually data-intensive, and the lack of data severely limits their performance. In this paper, we present a new dataset named REGRAD for the learning of relationships among objects and grasps. We collect the annotations of object poses, segmentations, grasps, and relationships for the target-driven relational grasping tasks. Our dataset is collected in both forms of 2D images and 3D point clouds. Moreover, since all the data are generated automatically, it is free to import new objects for data generation. We also released a real-world validation dataset to evaluate the sim-to-real performance of models trained on REGRAD. Finally, we conducted a series of experiments, showing that the models trained on REGRAD could generalize well to the realistic scenarios, in terms of both relationship and grasp detection. Our dataset and code could be found at: https://github.com/poisonwine/REGRAD
ROSep 18, 2021
Density-based Curriculum for Multi-goal Reinforcement Learning with Sparse RewardsDeyu Yang, Hanbo Zhang, Xuguang Lan et al.
Multi-goal reinforcement learning (RL) aims to qualify the agent to accomplish multi-goal tasks, which is of great importance in learning scalable robotic manipulation skills. However, reward engineering always requires strenuous efforts in multi-goal RL. Moreover, it will introduce inevitable bias causing the suboptimality of the final policy. The sparse reward provides a simple yet efficient way to overcome such limits. Nevertheless, it harms the exploration efficiency and even hinders the policy from convergence. In this paper, we propose a density-based curriculum learning method for efficient exploration with sparse rewards and better generalization to desired goal distribution. Intuitively, our method encourages the robot to gradually broaden the frontier of its ability along the directions to cover the entire desired goal space as much and quickly as possible. To further improve data efficiency and generality, we augment the goals and transitions within the allowed region during training. Finally, We evaluate our method on diversified variants of benchmark manipulation tasks that are challenging for existing methods. Empirical results show that our method outperforms the state-of-the-art baselines in terms of both data efficiency and success rate.
CVAug 29, 2021
MBDF-Net: Multi-Branch Deep Fusion Network for 3D Object DetectionXun Tan, Xingyu Chen, Guowei Zhang et al.
Point clouds and images could provide complementary information when representing 3D objects. Fusing the two kinds of data usually helps to improve the detection results. However, it is challenging to fuse the two data modalities, due to their different characteristics and the interference from the non-interest areas. To solve this problem, we propose a Multi-Branch Deep Fusion Network (MBDF-Net) for 3D object detection. The proposed detector has two stages. In the first stage, our multi-branch feature extraction network utilizes Adaptive Attention Fusion (AAF) modules to produce cross-modal fusion features from single-modal semantic features. In the second stage, we use a region of interest (RoI) -pooled fusion module to generate enhanced local features for refinement. A novel attention-based hybrid sampling strategy is also proposed for selecting key points in the downsampling process. We evaluate our approach on two widely used benchmark datasets including KITTI and SUN-RGBD. The experimental results demonstrate the advantages of our method over state-of-the-art approaches.