CVNov 30, 2020Code
End-to-End 3D Point Cloud Learning for Registration Task Using Virtual CorrespondencesZhijian Qiao, Huanshu Wei, Zhe Liu et al.
3D Point cloud registration is still a very challenging topic due to the difficulty in finding the rigid transformation between two point clouds with partial correspondences, and it's even harder in the absence of any initial estimation information. In this paper, we present an end-to-end deep-learning based approach to resolve the point cloud registration problem. Firstly, the revised LPD-Net is introduced to extract features and aggregate them with the graph network. Secondly, the self-attention mechanism is utilized to enhance the structure information in the point cloud and the cross-attention mechanism is designed to enhance the corresponding information between the two input point clouds. Based on which, the virtual corresponding points can be generated by a soft pointer based method, and finally, the point cloud registration problem can be solved by implementing the SVD method. Comparison results in ModelNet40 dataset validate that the proposed approach reaches the state-of-the-art in point cloud registration tasks and experiment resutls in KITTI dataset validate the effectiveness of the proposed approach in real applications.Our source code is available at \url{https://github.com/qiaozhijian/VCR-Net.git}
RODec 11, 2025
WholeBodyVLA: Towards Unified Latent VLA for Whole-Body Loco-Manipulation ControlHaoran Jiang, Jin Chen, Qingwen Bu et al.
Humanoid robots require precise locomotion and dexterous manipulation to perform challenging loco-manipulation tasks. Yet existing approaches, modular or end-to-end, are deficient in manipulation-aware locomotion. This confines the robot to a limited workspace, preventing it from performing large-space loco-manipulation. We attribute this to: (1) the challenge of acquiring loco-manipulation knowledge due to the scarcity of humanoid teleoperation data, and (2) the difficulty of faithfully and reliably executing locomotion commands, stemming from the limited precision and stability of existing RL controllers. To acquire richer loco-manipulation knowledge, we propose a unified latent learning framework that enables Vision-Language-Action (VLA) system to learn from low-cost action-free egocentric videos. Moreover, an efficient human data collection pipeline is devised to augment the dataset and scale the benefits. To execute the desired locomotion commands more precisely, we present a loco-manipulation-oriented (LMO) RL policy specifically tailored for accurate and stable core loco-manipulation movements, such as advancing, turning, and squatting. Building on these components, we introduce WholeBodyVLA, a unified framework for humanoid loco-manipulation. To the best of our knowledge, WholeBodyVLA is one of its kind enabling large-space humanoid loco-manipulation. It is verified via comprehensive experiments on the AgiBot X2 humanoid, outperforming prior baseline by 21.3%. It also demonstrates strong generalization and high extensibility across a broad range of tasks.
CVApr 30, 2019
SeqLPD: Sequence Matching Enhanced Loop-Closure Detection Based on Large-Scale Point Cloud Description for Self-Driving VehiclesZhe Liu, Chuanzhe Suo, Shunbo Zhou et al.
Place recognition and loop-closure detection are main challenges in the localization, mapping and navigation tasks of self-driving vehicles. In this paper, we solve the loop-closure detection problem by incorporating the deep-learning based point cloud description method and the coarse-to-fine sequence matching strategy. More specifically, we propose a deep neural network to extract a global descriptor from the original large-scale 3D point cloud, then based on which, a typical place analysis approach is presented to investigate the feature space distribution of the global descriptors and select several super keyframes. Finally, a coarse-to-fine strategy, which includes a super keyframe based coarse matching stage and a local sequence matching stage, is presented to ensure the loop-closure detection accuracy and real-time performance simultaneously. Thanks to the sequence matching operation, the proposed approach obtains an improvement against the existing deep-learning based methods. Experiment results on a self-driving vehicle validate the effectiveness of the proposed loop-closure detection algorithm.
CVDec 11, 2018
LPD-Net: 3D Point Cloud Learning for Large-Scale Place Recognition and Environment AnalysisZhe Liu, Shunbo Zhou, Chuanzhe Suo et al.
Point cloud based place recognition is still an open issue due to the difficulty in extracting local features from the raw 3D point cloud and generating the global descriptor, and it's even harder in the large-scale dynamic environments. In this paper, we develop a novel deep neural network, named LPD-Net (Large-scale Place Description Network), which can extract discriminative and generalizable global descriptors from the raw 3D point cloud. Two modules, the adaptive local feature extraction module and the graph-based neighborhood aggregation module, are proposed, which contribute to extract the local structures and reveal the spatial distribution of local features in the large-scale point cloud, with an end-to-end manner. We implement the proposed global descriptor in solving point cloud based retrieval tasks to achieve the large-scale place recognition. Comparison results show that our LPD-Net is much better than PointNetVLAD and reaches the state-of-the-art. We also compare our LPD-Net with the vision-based solutions to show the robustness of our approach to different weather and light conditions.