Haoyang Ye

RO
h-index8
23papers
1,282citations
Novelty40%
AI Score40

23 Papers

CVJan 30
Hybrid Cross-Device Localization via Neural Metric Learning and Feature Fusion

Meixia Lin, Mingkai Liu, Shuxue Peng et al.

We present a hybrid cross-device localization pipeline developed for the CroCoDL 2025 Challenge. Our approach integrates a shared retrieval encoder and two complementary localization branches: a classical geometric branch using feature fusion and PnP, and a neural feed-forward branch (MapAnything) for metric localization conditioned on geometric inputs. A neural-guided candidate pruning strategy further filters unreliable map frames based on translation consistency, while depth-conditioned localization refines metric scale and translation precision on Spot scenes. These components jointly lead to significant improvements in recall and accuracy across both HYDRO and SUCCU benchmarks. Our method achieved a final score of 92.62 (R@0.5m, 5°) during the challenge.

RONov 9, 2020Code
Geometric Structure Aided Visual Inertial Localization

Huaiyang Huang, Haoyang Ye, Jianhao Jiao et al.

Visual Localization is an essential component in autonomous navigation. Existing approaches are either based on the visual structure from SLAM/SfM or the geometric structure from dense mapping. To take the advantages of both, in this work, we present a complete visual inertial localization system based on a hybrid map representation to reduce the computational cost and increase the positioning accuracy. Specially, we propose two modules for data association and batch optimization, respectively. To this end, we develop an efficient data association module to associate map components with local features, which takes only $2$ms to generate temporal landmarks. For batch optimization, instead of using visual factors, we develop a module to estimate a pose prior from the instant localization results to constrain poses. The experimental results on the EuRoC MAV dataset demonstrate a competitive performance compared to the state of the arts. Specially, our system achieves an average position error in 1.7 cm with 100% recall. The timings show that the proposed modules reduce the computational cost by 20-30%. We will make our implementation open source at http://github.com/hyhuang1995/gmmloc.

ROJun 7, 2019Code
Key Ingredients of Self-Driving Cars

Rui Fan, Jianhao Jiao, Haoyang Ye et al.

Over the past decade, many research articles have been published in the area of autonomous driving. However, most of them focus only on a specific technological area, such as visual environment perception, vehicle control, etc. Furthermore, due to fast advances in the self-driving car technology, such articles become obsolete very fast. In this paper, we give a brief but comprehensive overview on key ingredients of autonomous cars (ACs), including driving automation levels, AC sensors, AC software, open source datasets, industry leaders, AC applications and existing challenges.

AIFeb 20, 2024
Roadmap on Incentive Compatibility for AI Alignment and Governance in Sociotechnical Systems

Zhaowei Zhang, Fengshuo Bai, Mingzhi Wang et al.

The burgeoning integration of artificial intelligence (AI) into human society brings forth significant implications for societal governance and safety. While considerable strides have been made in addressing AI alignment challenges, existing methodologies primarily focus on technical facets, often neglecting the intricate sociotechnical nature of AI systems, which can lead to a misalignment between the development and deployment contexts. To this end, we posit a new problem worth exploring: Incentive Compatibility Sociotechnical Alignment Problem (ICSAP). We hope this can call for more researchers to explore how to leverage the principles of Incentive Compatibility (IC) from game theory to bridge the gap between technical and societal components to maintain AI consensus with human societies in different contexts. We further discuss three classical game problems for achieving IC: mechanism design, contract theory, and Bayesian persuasion, in addressing the perspectives, potentials, and challenges of solving ICSAP, and provide preliminary implementation conceptions.

CVJun 24, 2024
Crowd-Sourced NeRF: Collecting Data from Production Vehicles for 3D Street View Reconstruction

Tong Qin, Changze Li, Haoyang Ye et al.

Recently, Neural Radiance Fields (NeRF) achieved impressive results in novel view synthesis. Block-NeRF showed the capability of leveraging NeRF to build large city-scale models. For large-scale modeling, a mass of image data is necessary. Collecting images from specially designed data-collection vehicles can not support large-scale applications. How to acquire massive high-quality data remains an opening problem. Noting that the automotive industry has a huge amount of image data, crowd-sourcing is a convenient way for large-scale data collection. In this paper, we present a crowd-sourced framework, which utilizes substantial data captured by production vehicles to reconstruct the scene with the NeRF model. This approach solves the key problem of large-scale reconstruction, that is where the data comes from and how to use them. Firstly, the crowd-sourced massive data is filtered to remove redundancy and keep a balanced distribution in terms of time and space. Then a structure-from-motion module is performed to refine camera poses. Finally, images, as well as poses, are used to train the NeRF model in a certain block. We highlight that we present a comprehensive framework that integrates multiple modules, including data selection, sparse 3D reconstruction, sequence appearance embedding, depth supervision of ground surface, and occlusion completion. The complete system is capable of effectively processing and reconstructing high-quality 3D scenes from crowd-sourced data. Extensive quantitative and qualitative experiments were conducted to validate the performance of our system. Moreover, we proposed an application, named first-view navigation, which leveraged the NeRF model to generate 3D street view and guide the driver with a synthesized video.

ROAug 4, 2021
Incorporating Learnt Local and Global Embeddings into Monocular Visual SLAM

Huaiyang Huang, Haoyang Ye, Yuxiang Sun et al.

Traditional approaches for Visual Simultaneous Localization and Mapping (VSLAM) rely on low-level vision information for state estimation, such as handcrafted local features or the image gradient. While significant progress has been made through this track, under more challenging configuration for monocular VSLAM, e.g., varying illumination, the performance of state-of-the-art systems generally degrades. As a consequence, robustness and accuracy for monocular VSLAM are still widely concerned. This paper presents a monocular VSLAM system that fully exploits learnt features for better state estimation. The proposed system leverages both learnt local features and global embeddings at different modules of the system: direct camera pose estimation, inter-frame feature association, and loop closure detection. With a probabilistic explanation of keypoint prediction, we formulate the camera pose tracking in a direct manner and parameterize local features with uncertainty taken into account. To alleviate the quantization effect, we adapt the mapping module to generate 3D landmarks better to guarantee the system's robustness. Detecting temporal loop closure via deep global embeddings further improves the robustness and accuracy of the proposed system. The proposed system is extensively evaluated on public datasets (Tsukuba, EuRoC, and KITTI), and compared against the state-of-the-art methods. The competitive performance of camera pose estimation confirms the effectiveness of our method.

CVApr 8, 2021
3D Surfel Map-Aided Visual Relocalization with Learned Descriptors

Haoyang Ye, Huaiyang Huang, Marco Hutter et al.

In this paper, we introduce a method for visual relocalization using the geometric information from a 3D surfel map. A visual database is first built by global indices from the 3D surfel map rendering, which provides associations between image points and 3D surfels. Surfel reprojection constraints are utilized to optimize the keyframe poses and map points in the visual database. A hierarchical camera relocalization algorithm then utilizes the visual database to estimate 6-DoF camera poses. Learned descriptors are further used to improve the performance in challenging cases. We present evaluation under real-world conditions and simulation to show the effectiveness and efficiency of our method, and make the final camera poses consistently well aligned with the 3D environment.

ROMar 24, 2021
Greedy-Based Feature Selection for Efficient LiDAR SLAM

Jianhao Jiao, Yilong Zhu, Haoyang Ye et al.

Modern LiDAR-SLAM (L-SLAM) systems have shown excellent results in large-scale, real-world scenarios. However, they commonly have a high latency due to the expensive data association and nonlinear optimization. This paper demonstrates that actively selecting a subset of features significantly improves both the accuracy and efficiency of an L-SLAM system. We formulate the feature selection as a combinatorial optimization problem under a cardinality constraint to preserve the information matrix's spectral attributes. The stochastic-greedy algorithm is applied to approximate the optimal results in real-time. To avoid ill-conditioned estimation, we also propose a general strategy to evaluate the environment's degeneracy and modify the feature number online. The proposed feature selector is integrated into a multi-LiDAR SLAM system. We validate this enhanced system with extensive experiments covering various scenarios on two sensor setups and computation platforms. We show that our approach exhibits low localization error and speedup compared to the state-of-the-art L-SLAM systems. To benefit the community, we have released the source code: https://ram-lab.com/file/site/m-loam.

ROOct 27, 2020
Robust Odometry and Mapping for Multi-LiDAR Systems with Online Extrinsic Calibration

Jianhao Jiao, Haoyang Ye, Yilong Zhu et al.

Combining multiple LiDARs enables a robot to maximize its perceptual awareness of environments and obtain sufficient measurements, which is promising for simultaneous localization and mapping (SLAM). This paper proposes a system to achieve robust and simultaneous extrinsic calibration, odometry, and mapping for multiple LiDARs. Our approach starts with measurement preprocessing to extract edge and planar features from raw measurements. After a motion and extrinsic initialization procedure, a sliding window-based multi-LiDAR odometry runs onboard to estimate poses with online calibration refinement and convergence identification. We further develop a mapping algorithm to construct a global map and optimize poses with sufficient features together with a method to model and reduce data uncertainty. We validate our approach's performance with extensive experiments on ten sequences (4.60km total length) for the calibration and SLAM and compare them against the state-of-the-art. We demonstrate that the proposed work is a complete, robust, and extensible system for various multi-LiDAR setups. The source code, datasets, and demonstrations are available at https://ram-lab.com/file/site/m-loam.

ROJun 24, 2020
GMMLoc: Structure Consistent Visual Localization with Gaussian Mixture Models

Huaiyang Huang, Haoyang Ye, Yuxiang Sun et al.

Incorporating prior structure information into the visual state estimation could generally improve the localization performance. In this letter, we aim to address the paradox between accuracy and efficiency in coupling visual factors with structure constraints. To this end, we present a cross-modality method that tracks a camera in a prior map modelled by the Gaussian Mixture Model (GMM). With the pose estimated by the front-end initially, the local visual observations and map components are associated efficiently, and the visual structure from the triangulation is refined simultaneously. By introducing the hybrid structure factors into the joint optimization, the camera poses are bundle-adjusted with the local visual structure. By evaluating our complete system, namely GMMLoc, on the public dataset, we show how our system can provide a centimeter-level localization accuracy with only trivial computational overhead. In addition, the comparative studies with the state-of-the-art vision-dominant state estimators demonstrate the competitive performance of our method.

ROApr 16, 2020
The Role of the Hercules Autonomous Vehicle During the COVID-19 Pandemic: An Autonomous Logistic Vehicle for Contactless Goods Transportation

Tianyu Liu, Qinghai Liao, Lu Gan et al.

Since early 2020, the coronavirus disease 2019 (COVID-19) has spread rapidly across the world. As at the date of writing this article, the disease has been globally reported in 223 countries and regions, infected over 108 million people and caused over 2.4 million deaths (https://covid19.who.int/, accessed on Feb. 17, 2021). Avoiding person-to-person transmission is an effective approach to control and prevent the pandemic. However, many daily activities, such as transporting goods in our daily life, inevitably involve person-to-person contact. Using an autonomous logistic vehicle to achieve contact-less goods transportation could alleviate this issue. For example, it can reduce the risk of virus transmission between the driver and customers. Moreover, many countries have imposed tough lockdown measures to reduce the virus transmission (e.g., retail, catering) during the pandemic, which causes inconveniences for human daily life. Autonomous vehicle can deliver the goods bought by humans, so that humans can get the goods without going out. These demands motivate us to develop an autonomous vehicle, named as Hercules, for contact-less goods transportation during the COVID-19 pandemic. The vehicle is evaluated through real-world delivering tasks under various traffic conditions.

ROMar 31, 2020
Metric Monocular Localization Using Signed Distance Fields

Huaiyang Huang, Yuxiang Sun, Haoyang Ye et al.

Metric localization plays a critical role in vision-based navigation. For overcoming the degradation of matching photometry under appearance changes, recent research resorted to introducing geometry constraints of the prior scene structure. In this paper, we present a metric localization method for the monocular camera, using the Signed Distance Field (SDF) as a global map representation. Leveraging the volumetric distance information from SDFs, we aim to relax the assumption of an accurate structure from the local Bundle Adjustment (BA) in previous methods. By tightly coupling the distance factor with temporal visual constraints, our system corrects the odometry drift and jointly optimizes global camera poses with the local structure. We validate the proposed approach on both indoor and outdoor public datasets. Compared to the state-of-the-art methods, it achieves a comparable performance with a minimal sensor configuration.

ROFeb 23, 2020
Monocular Direct Sparse Localization in a Prior 3D Surfel Map

Haoyang Ye, Huaiyang Huang, Ming Liu

In this paper, we introduce an approach to tracking the pose of a monocular camera in a prior surfel map. By rendering vertex and normal maps from the prior surfel map, the global planar information for the sparse tracked points in the image frame is obtained. The tracked points with and without the global planar information involve both global and local constraints of frames to the system. Our approach formulates all constraints in the form of direct photometric errors within a local window of the frames. The final optimization utilizes these constraints to provide the accurate estimation of global 6-DoF camera poses with the absolute scale. The extensive simulation and real-world experiments demonstrate that our monocular method can provide accurate camera localization results under various conditions.

ROJul 4, 2019
LINS: A Lidar-Inertial State Estimator for Robust and Efficient Navigation

Chao Qin, Haoyang Ye, Christian E. Pranata et al.

We present LINS, a lightweight lidar-inertial state estimator, for real-time ego-motion estimation. The proposed method enables robust and efficient navigation for ground vehicles in challenging environments, such as feature-less scenes, via fusing a 6-axis IMU and a 3D lidar in a tightly-coupled scheme. An iterated error-state Kalman filter (ESKF) is designed to correct the estimated state recursively by generating new feature correspondences in each iteration, and to keep the system computationally tractable. Moreover, we use a robocentric formulation that represents the state in a moving local frame in order to prevent filter divergence in a long run. To validate robustness and generalizability, extensive experiments are performed in various scenarios. Experimental results indicate that LINS offers comparable performance with the state-of-the-art lidar-inertial odometry in terms of stability and accuracy and has order-of-magnitude improvement in speed.

ROMay 13, 2019
Automatic Calibration of Multiple 3D LiDARs in Urban Environments

Jianhao Jiao, Yang Yu, Qinghai Liao et al.

Multiple LiDARs have progressively emerged on autonomous vehicles for rendering a wide field of view and dense measurements. However, the lack of precise calibration negatively affects their potential applications in localization and perception systems. In this paper, we propose a novel system that enables automatic multi-LiDAR calibration without any calibration target, prior environmental information, and initial values of the extrinsic parameters. Our approach starts with a hand-eye calibration for automatic initialization by aligning the estimated motions of each sensor. The resulting parameters are then refined with an appearance-based method by minimizing a cost function constructed from point-plane correspondences. Experimental results on simulated and real-world data sets demonstrate the reliability and accuracy of our calibration approach. The proposed approach can calibrate a multi-LiDAR system with the rotation and translation errors less than 0.04 [rad] and 0.1 [m] respectively for a mobile platform.

ROApr 15, 2019
Tightly Coupled 3D Lidar Inertial Odometry and Mapping

Haoyang Ye, Yuying Chen, Ming Liu

Ego-motion estimation is a fundamental requirement for most mobile robotic applications. By sensor fusion, we can compensate the deficiencies of stand-alone sensors and provide more reliable estimations. We introduce a tightly coupled lidar-IMU fusion method in this paper. By jointly minimizing the cost derived from lidar and IMU measurements, the lidar-IMU odometry (LIO) can perform well with acceptable drift after long-term experiment, even in challenging cases where the lidar measurements can be degraded. Besides, to obtain more reliable estimations of the lidar poses, a rotation-constrained refinement algorithm (LIO-mapping) is proposed to further align the lidar poses with the global map. The experiment results demonstrate that the proposed method can estimate the poses of the sensor pair at the IMU update rate with high precision, even under fast motion conditions or with insufficient features.

ROApr 4, 2019
Hierarchical Trajectory Planning for Autonomous Driving in Low-speed Driving Scenarios Based on RRT and Optimization

Yuying Chen, Haoyang Ye, Ming Liu

Though great effort has been put into the study of path planning on urban roads and highways, few works have studied the driving strategy and trajectory planning in low-speed driving scenarios, e.g., driving on a university campus or driving through a housing or industrial estate. The study of planning in these scenarios is crucial as these environments often cover the first or the last one kilometer of a daily travel or logistic system. Additionally, it is essential to treat these scenarios differently as, in most cases, the driving environment is narrow, dynamic, and rich with obstacles, which also causes the planning in such environments to continue to be a challenging task. This paper proposes a hierarchical planning approach that separates the path planning and the temporal planning. A path that satisfies the kinematic constraints is generated through a modified bidirectional rapidly exploring random tree (bi-RRT) approach. Following that, the timestamp of each node of the path is optimized through sequential quadratic programming (SQP) with the feasible searching bounds defined by safe intervals (SIs). Simulations and real tests in different driving scenarios prove the effectiveness of this method.

ROMar 3, 2019
Visual-based Autonomous Driving Deployment from a Stochastic and Uncertainty-aware Perspective

Lei Tai, Peng Yun, Yuying Chen et al.

End-to-end visual-based imitation learning has been widely applied in autonomous driving. When deploying the trained visual-based driving policy, a deterministic command is usually directly applied without considering the uncertainty of the input data. Such kind of policies may bring dramatical damage when applied in the real world. In this paper, we follow the recent real-to-sim pipeline by translating the testing world image back to the training domain when using the trained policy. In the translating process, a stochastic generator is used to generate various images stylized under the training domain randomly or directionally. Based on those translated images, the trained uncertainty-aware imitation learning policy would output both the predicted action and the data uncertainty motivated by the aleatoric loss function. Through the uncertainty-aware imitation learning policy, we can easily choose the safest one with the lowest uncertainty among the generated images. Experiments in the Carla navigation benchmark show that our strategy outperforms previous methods, especially in dynamic environments.

CVOct 23, 2018
Point-cloud-based place recognition using CNN feature extraction

Ting Sun, Ming Liu, Haoyang Ye et al.

This paper proposes a novel point-cloud-based place recognition system that adopts a deep learning approach for feature extraction. By using a convolutional neural network pre-trained on color images to extract features from a range image without fine-tuning on extra range images, significant improvement has been observed when compared to using hand-crafted features. The resulting system is illumination invariant, rotation invariant and robust against moving objects that are unrelated to the place identity. Apart from the system itself, we also bring to the community a new place recognition dataset containing both point cloud and grayscale images covering a full $360^\circ$ environmental view. In addition, the dataset is organized in such a way that it facilitates experimental validation with respect to rotation invariance or robustness against unrelated moving objects separately.

ROOct 19, 2017
LiDAR and Inertial Fusion for Pose Estimation by Non-linear Optimization

Haoyang Ye, Ming Liu

Pose estimation purely based on 3D point-cloud could suffer from degradation, e.g. scan blocks or scans in repetitive environments. To deal with this problem, we propose an approach for fusing 3D spinning LiDAR and IMU to estimate the ego-motion of the sensor body. The main idea of our work is to optimize the poses and states of two kinds of sensors with non-linear optimization methods. On the one hand, a bunch of IMU measurements are considered as a relative constraint using pre-integration and the state errors can be minimized with the help of laser pose estimation and non-linear optimization algorithms; on the other hand, the optimized IMU pose outputs can provide a better initial for the subsequent point-cloud matching. The method is evaluated under both simulation and real tests with comparison to the state-of-the-art. The results show that the proposed method can provide better pose estimation performance even in the degradation cases.

ROSep 22, 2017
Characterization of a RS-LiDAR for 3D Perception

Zhe Wang, Yang Liu, Qinghai Liao et al.

High precision 3D LiDARs are still expensive and hard to acquire. This paper presents the characteristics of RS-LiDAR, a model of low-cost LiDAR with sufficient supplies, in comparison with VLP-16. The paper also provides a set of evaluations to analyze the characterizations and performances of LiDARs sensors. This work analyzes multiple properties, such as drift effects, distance effects, color effects and sensor orientation effects, in the context of 3D perception. By comparing with Velodyne LiDAR, we found RS-LiDAR as a cheaper and acquirable substitute of VLP-16 with similar efficiency.

CVMar 2, 2017
Extrinsic Calibration of 3D Range Finder and Camera without Auxiliary Object or Human Intervention

Qinghai Liao, Ming Liu, Lei Tai et al.

Fusion of heterogeneous extroceptive sensors is the most effient and effective way to representing the environment precisely, as it overcomes various defects of each homogeneous sensor. The rigid transformation (aka. extrinsic parameters) of heterogeneous sensory systems should be available before precisely fusing the multisensor information. Researchers have proposed several approaches to estimating the extrinsic parameters. These approaches require either auxiliary objects, like chessboards, or extra help from human to select correspondences. In this paper, we proposed a novel extrinsic calibration approach for the extrinsic calibration of range and image sensors. As far as we know, it is the first automatic approach with no requirement of auxiliary objects or any human interventions. First, we estimate the initial extrinsic parameters from the individual motion of the range finder and the camera. Then we extract lines in the image and point-cloud pairs, to refine the line feature associations by the initial extrinsic parameters. At the end, we discussed the degenerate case which may lead to the algorithm failure and validate our approach by simulation. The results indicate high-precision extrinsic calibration results against the ground-truth.

CVOct 6, 2016
PCA-aided Fully Convolutional Networks for Semantic Segmentation of Multi-channel fMRI

Lei Tai, Haoyang Ye, Qiong Ye et al.

Semantic segmentation of functional magnetic resonance imaging (fMRI) makes great sense for pathology diagnosis and decision system of medical robots. The multi-channel fMRI provides more information of the pathological features. But the increased amount of data causes complexity in feature detections. This paper proposes a principal component analysis (PCA)-aided fully convolutional network to particularly deal with multi-channel fMRI. We transfer the learned weights of contemporary classification networks to the segmentation task by fine-tuning. The results of the convolutional network are compared with various methods e.g. k-NN. A new labeling strategy is proposed to solve the semantic segmentation problem with unclear boundaries. Even with a small-sized training dataset, the test results demonstrate that our model outperforms other pathological feature detection methods. Besides, its forward inference only takes 90 milliseconds for a single set of fMRI data. To our knowledge, this is the first time to realize pixel-wise labeling of multi-channel magnetic resonance image using FCN.