Xuecheng Xu

CV
15papers
333citations
Novelty57%
AI Score34

15 Papers

ROMar 2, 2022
Translation Invariant Global Estimation of Heading Angle Using Sinogram of LiDAR Point Cloud

Xiaqing Ding, Xuecheng Xu, Sha Lu et al.

Global point cloud registration is an essential module for localization, of which the main difficulty exists in estimating the rotation globally without initial value. With the aid of gravity alignment, the degree of freedom in point cloud registration could be reduced to 4DoF, in which only the heading angle is required for rotation estimation. In this paper, we propose a fast and accurate global heading angle estimation method for gravity-aligned point clouds. Our key idea is that we generate a translation invariant representation based on Radon Transform, allowing us to solve the decoupled heading angle globally with circular cross-correlation. Besides, for heading angle estimation between point clouds with different distributions, we implement this heading angle estimator as a differentiable module to train a feature extraction network end- to-end. The experimental results validate the effectiveness of the proposed method in heading angle estimation and show better performance compared with other methods.

ROOct 12, 2022
RING++: Roto-translation Invariant Gram for Global Localization on a Sparse Scan Map

Xuecheng Xu, Sha Lu, Jun Wu et al.

Global localization plays a critical role in many robot applications. LiDAR-based global localization draws the community's focus with its robustness against illumination and seasonal changes. To further improve the localization under large viewpoint differences, we propose RING++ which has roto-translation invariant representation for place recognition, and global convergence for both rotation and translation estimation. With the theoretical guarantee, RING++ is able to address the large viewpoint difference using a lightweight map with sparse scans. In addition, we derive sufficient conditions of feature extractors for the representation preserving the roto-translation invariance, making RING++ a framework applicable to generic multi-channel features. To the best of our knowledge, this is the first learning-free framework to address all subtasks of global localization in the sparse scan map. Validations on real-world datasets show that our approach demonstrates better performance than state-of-the-art learning-free methods, and competitive performance with learning-based methods. Finally, we integrate RING++ into a multi-robot/session SLAM system, performing its effectiveness in collaborative applications.

CVOct 20, 2022
DeepRING: Learning Roto-translation Invariant Representation for LiDAR based Place Recognition

Sha Lu, Xuecheng Xu, Li Tang et al.

LiDAR based place recognition is popular for loop closure detection and re-localization. In recent years, deep learning brings improvements to place recognition by learnable feature extraction. However, these methods degenerate when the robot re-visits previous places with large perspective difference. To address the challenge, we propose DeepRING to learn the roto-translation invariant representation from LiDAR scan, so that robot visits the same place with different perspective can have similar representations. There are two keys in DeepRING: the feature is extracted from sinogram, and the feature is aggregated by magnitude spectrum. The two steps keeps the final representation with both discrimination and roto-translation invariance. Moreover, we state the place recognition as a one-shot learning problem with each place being a class, leveraging relation learning to build representation similarity. Substantial experiments are carried out on public datasets, validating the effectiveness of each proposed component, and showing that DeepRING outperforms the comparative methods, especially in dataset level generalization.

CVJun 12, 2022
DPCN++: Differentiable Phase Correlation Network for Versatile Pose Registration

Zexi Chen, Yiyi Liao, Haozhe Du et al.

Pose registration is critical in vision and robotics. This paper focuses on the challenging task of initialization-free pose registration up to 7DoF for homogeneous and heterogeneous measurements. While recent learning-based methods show promise using differentiable solvers, they either rely on heuristically defined correspondences or are prone to local minima. We present a differentiable phase correlation (DPC) solver that is globally convergent and correspondence-free. When combined with simple feature extraction networks, our general framework DPCN++ allows for versatile pose registration with arbitrary initialization. Specifically, the feature extraction networks first learn dense feature grids from a pair of homogeneous/heterogeneous measurements. These feature grids are then transformed into a translation and scale invariant spectrum representation based on Fourier transform and spherical radial aggregation, decoupling translation and scale from rotation. Next, the rotation, scale, and translation are independently and efficiently estimated in the spectrum step-by-step using the DPC solver. The entire pipeline is differentiable and trained end-to-end. We evaluate DCPN++ on a wide range of registration tasks taking different input modalities, including 2D bird's-eye view images, 3D object and scene measurements, and medical images. Experimental results demonstrate that DCPN++ outperforms both classical and learning-based baselines, especially on partially observed and heterogeneous measurements.

CVAug 30, 2024
RING#: PR-by-PE Global Localization with Roto-translation Equivariant Gram Learning

Sha Lu, Xuecheng Xu, Yuxuan Wu et al.

Global localization using onboard perception sensors, such as cameras and LiDARs, is crucial in autonomous driving and robotics applications when GPS signals are unreliable. Most approaches achieve global localization by sequential place recognition (PR) and pose estimation (PE). Some methods train separate models for each task, while others employ a single model with dual heads, trained jointly with separate task-specific losses. However, the accuracy of localization heavily depends on the success of place recognition, which often fails in scenarios with significant changes in viewpoint or environmental appearance. Consequently, this renders the final pose estimation of localization ineffective. To address this, we introduce a new paradigm, PR-by-PE localization, which bypasses the need for separate place recognition by directly deriving it from pose estimation. We propose RING#, an end-to-end PR-by-PE localization network that operates in the bird's-eye-view (BEV) space, compatible with both vision and LiDAR sensors. RING# incorporates a novel design that learns two equivariant representations from BEV features, enabling globally convergent and computationally efficient pose estimation. Comprehensive experiments on the NCLT and Oxford datasets show that RING# outperforms state-of-the-art methods in both vision and LiDAR modalities, validating the effectiveness of the proposed approach. The code will be publicly released.

ROSep 20, 2021Code
HiTMap: A Hierarchical Topological Map Representation for Navigation in Unknown Environments

Xuecheng Xu, Cheng Wang, Yue Wang et al.

The ability to autonomously navigate in unknown environments is important for mobile robots. The map is the core component to achieve this. Most map representations rely on drift-free state estimation and provide a global metric map to navigate. However, in large-scale real-world applications, it's hard to prohibit drifts and compose a globally consistent map quickly. In this paper, a novel representation named, HiTMap, is proposed to enhance the existing map representations. The central idea is to adopt a submap-based hierarchical topology rather than a global metric map so that only a local metric map is maintained for obstacle avoidance which ensures the lightweight of the representation. To guide the robots navigate into unknown spaces, frontiers are detected and attached to the map as an attribute. We also develop a path planning module to evaluate the feasibility and efficiency of our map representation. The system is validated in a simulation environment and a demonstration in the real world is conducted. In addition, the HiTMap is made available open-source.

CVJan 30, 2021Code
Radar-to-Lidar: Heterogeneous Place Recognition via Joint Learning

Huan Yin, Xuecheng Xu, Yue Wang et al.

Place recognition is critical for both offline mapping and online localization. However, current single-sensor based place recognition still remains challenging in adverse conditions. In this paper, a heterogeneous measurements based framework is proposed for long-term place recognition, which retrieves the query radar scans from the existing lidar maps. To achieve this, a deep neural network is built with joint training in the learning stage, and then in the testing stage, shared embeddings of radar and lidar are extracted for heterogeneous place recognition. To validate the effectiveness of the proposed method, we conduct tests and generalization experiments on the multi-session public datasets compared to other competitive methods. The experimental results indicate that our model is able to perform multiple place recognitions: lidar-to-lidar, radar-to-radar and radar-to-lidar, while the learned model is trained only once. We also release the source code publicly: https://github.com/ZJUYH/radar-to-lidar-place-recognition.

CVOct 31, 2020Code
PREGAN: Pose Randomization and Estimation for Weakly Paired Image Style Translation

Zexi Chen, Jiaxin Guo, Xuecheng Xu et al.

Utilizing the trained model under different conditions without data annotation is attractive for robot applications. Towards this goal, one class of methods is to translate the image style from another environment to the one on which models are trained. In this paper, we propose a weakly-paired setting for the style translation, where the content in the two images is aligned with errors in poses. These images could be acquired by different sensors in different conditions that share an overlapping region, e.g. with LiDAR or stereo cameras, from sunny days or foggy nights. We consider this setting to be more practical with: (i) easier labeling than the paired data; (ii) better interpretability and detail retrieval than the unpaired data. To translate across such images, we propose PREGAN to train a style translator by intentionally transforming the two images with a random pose, and to estimate the given random pose by differentiable non-trainable pose estimator given that the more aligned in style, the better the estimated result is. Such adversarial training enforces the network to learn the style translation, avoiding being entangled with other variations. Finally, PREGAN is validated on both simulated and real-world collected data to show the effectiveness. Results on down-stream tasks, classification, road segmentation, object detection, and feature matching show its potential for real applications. https://github.com/wrld/PRoGAN

CVAug 21, 2020Code
Deep Phase Correlation for End-to-End Heterogeneous Sensor Measurements Matching

Zexi Chen, Xuecheng Xu, Yue Wang et al.

The crucial step for localization is to match the current observation to the map. When the two sensor modalities are significantly different, matching becomes challenging. In this paper, we present an end-to-end deep phase correlation network (DPCN) to match heterogeneous sensor measurements. In DPCN, the primary component is a differentiable correlation-based estimator that back-propagates the pose error to learnable feature extractors, which addresses the problem that there are no direct common features for supervision. Also, it eliminates the exhaustive evaluation in some previous methods, improving efficiency. With the interpretable modeling, the network is light-weighted and promising for better generalization. We evaluate the system on both the simulation data and Aero-Ground Dataset which consists of heterogeneous sensor images and aerial images acquired by satellites or aerial robots. The results show that our method is able to match the heterogeneous sensor measurements, outperforming the comparative traditional phase correlation and other learning-based methods. Code is available at https://github.com/jessychen1016/DPCN .

CVMay 23, 2023
Leveraging BEV Representation for 360-degree Visual Place Recognition

Xuecheng Xu, Yanmei Jiao, Sha Lu et al.

This paper investigates the advantages of using Bird's Eye View (BEV) representation in 360-degree visual place recognition (VPR). We propose a novel network architecture that utilizes the BEV representation in feature extraction, feature aggregation, and vision-LiDAR fusion, which bridges visual cues and spatial awareness. Our method extracts image features using standard convolutional networks and combines the features according to pre-defined 3D grid spatial points. To alleviate the mechanical and time misalignments between cameras, we further introduce deformable attention to learn the compensation. Upon the BEV feature representation, we then employ the polar transform and the Discrete Fourier transform for aggregation, which is shown to be rotation-invariant. In addition, the image and point cloud cues can be easily stated in the same coordinates, which benefits sensor fusion for place recognition. The proposed BEV-based method is evaluated in ablation and comparative studies on two datasets, including on-the-road and off-the-road scenarios. The experimental results verify the hypothesis that BEV can benefit VPR by its superior performance compared to baseline methods. To the best of our knowledge, this is the first trial of employing BEV representation in this task.

ROSep 25, 2021
Learning Interpretable BEV Based VIO without Deep Neural Networks

Zexi Chen, Haozhe Du, Xuecheng Xu et al.

Monocular visual-inertial odometry (VIO) is a critical problem in robotics and autonomous driving. Traditional methods solve this problem based on filtering or optimization. While being fully interpretable, they rely on manual interference and empirical parameter tuning. On the other hand, learning-based approaches allow for end-to-end training but require a large number of training data to learn millions of parameters. However, the non-interpretable and heavy models hinder the generalization ability. In this paper, we propose a fully differentiable, and interpretable, bird-eye-view (BEV) based VIO model for robots with local planar motion that can be trained without deep neural networks. Specifically, we first adopt Unscented Kalman Filter as a differentiable layer to predict the pitch and roll, where the covariance matrices of noise are learned to filter out the noise of the IMU raw data. Second, the refined pitch and roll are adopted to retrieve a gravity-aligned BEV image of each frame using differentiable camera projection. Finally, a differentiable pose estimator is utilized to estimate the remaining 3 DoF poses between the BEV frames: leading to a 5 DoF pose estimation. Our method allows for learning the covariance matrices end-to-end supervised by the pose estimation loss, demonstrating superior performance to empirical baselines. Experimental results on synthetic and real-world datasets demonstrate that our simple approach is competitive with state-of-the-art methods and generalizes well on unseen scenes.

CVMar 7, 2021
Learn to Differ: Sim2Real Small Defection Segmentation Network

Zexi Chen, Zheyuan Huang, Yunkai Wang et al.

Recent studies on deep-learning-based small defection segmentation approaches are trained in specific settings and tend to be limited by fixed context. Throughout the training, the network inevitably learns the representation of the background of the training data before figuring out the defection. They underperform in the inference stage once the context changed and can only be solved by training in every new setting. This eventually leads to the limitation in practical robotic applications where contexts keep varying. To cope with this, instead of training a network context by context and hoping it to generalize, why not stop misleading it with any limited context and start training it with pure simulation? In this paper, we propose the network SSDS that learns a way of distinguishing small defections between two images regardless of the context, so that the network can be trained once for all. A small defection detection layer utilizing the pose sensitivity of phase correlation between images is introduced and is followed by an outlier masking layer. The network is trained on randomly generated simulated data with simple shapes and is generalized across the real world. Finally, SSDS is validated on real-world collected data and demonstrates the ability that even when trained in cheap simulation, SSDS can still find small defections in the real world showing the effectiveness and its potential for practical applications.

CVNov 22, 2020
CORAL: Colored structural representation for bi-modal place recognition

Yiyuan Pan, Xuecheng Xu, Weijie Li et al.

Place recognition is indispensable for a drift-free localization system. Due to the variations of the environment, place recognition using single-modality has limitations. In this paper, we propose a bi-modal place recognition method, which can extract a compound global descriptor from the two modalities, vision and LiDAR. Specifically, we first build the elevation image generated from 3D points as a structural representation. Then, we derive the correspondences between 3D points and image pixels that are further used in merging the pixel-wise visual features into the elevation map grids. In this way, we fuse the structural features and visual features in the consistent bird-eye view frame, yielding a semantic representation, namely CORAL. And the whole network is called CORAL-VLAD. Comparisons on the Oxford RobotCar show that CORAL-VLAD has superior performance against other state-of-the-art methods. We also demonstrate that our network can be generalized to other scenes and sensor configurations on cross-city datasets.

ROOct 21, 2020
DiSCO: Differentiable Scan Context with Orientation

Xuecheng Xu, Huan Yin, Zexi Chen et al.

Global localization is essential for robot navigation, of which the first step is to retrieve a query from the map database. This problem is called place recognition. In recent years, LiDAR scan based place recognition has drawn attention as it is robust against the appearance change. In this paper, we propose a LiDAR-based place recognition method, named Differentiable Scan Context with Orientation (DiSCO), which simultaneously finds the scan at a similar place and estimates their relative orientation. The orientation can further be used as the initial value for the down-stream local optimal metric pose estimation, improving the pose estimation especially when a large orientation between the current scan and retrieved scan exists. Our key idea is to transform the feature into the frequency domain. We utilize the magnitude of the spectrum as the place signature, which is theoretically rotation-invariant. In addition, based on the differentiable phase correlation, we can efficiently estimate the global optimal relative orientation using the spectrum. With such structural constraints, the network can be learned in an end-to-end manner, and the backbone is fully shared by the two tasks, achieving interpretability and light weight. Finally, DiSCO is validated on three datasets with long-term outdoor conditions, showing better performance than the compared methods.

ROJul 22, 2020
Collaborative Localization of Aerial and Ground Mobile Robots through Orthomosaic Map

Xuecheng Xu, Zexi Chen, Jiaxin Guo et al.

With the deepening of research on the SLAM system, the possibility of cooperative SLAM with multi-robots has been proposed. This paper presents a map matching and localization approach considering the cooperative SLAM of an aerial-ground system. The proposed approach aims to help precisely matching the map constructed by two independent systems that have large scale variance of viewpoints of the same route and eventually enables the ground mobile robot to localize itself in the global map given by the drone. It contains dense mapping with Elevation Map and software "Metashape", map matching with a proposed template matching algorithm, weighted normalized cross-correlation (WNCC) and localization with particle filter. The approach enables map matching for cooperative SLAM with the feasibility of multiple scene sensors, varies from stereo cameras to lidars, and is insensitive to the synchronization of the two systems. We demonstrate the accuracy, robustness, and the speed of the approach under experiments of the Aero-Ground Dataset.