Fei Gao

h-index53

6papers

480citations

Novelty55%

AI Score47

Ranked #34,149 of 194,257 authors (top 18%)#900 in RO (top 13%)

6 Papers

5.7ROMar 6, 2025Code

Real-time Spatial-temporal Traversability Assessment via Feature-based Sparse Gaussian Process

Zhenyu Hou, Senming Tan, Zhihao Zhang et al.

Terrain analysis is critical for the practical ap- plication of ground mobile robots in real-world tasks, espe- cially in outdoor unstructured environments. In this paper, we propose a novel spatial-temporal traversability assessment method, which aims to enable autonomous robots to effectively navigate through complex terrains. Our approach utilizes sparse Gaussian processes (SGP) to extract geometric features (curvature, gradient, elevation, etc.) directly from point cloud scans. These features are then used to construct a high- resolution local traversability map. Then, we design a spatial- temporal Bayesian Gaussian kernel (BGK) inference method to dynamically evaluate traversability scores, integrating historical and real-time data while considering factors such as slope, flatness, gradient, and uncertainty metrics. GPU acceleration is applied in the feature extraction step, and the system achieves real-time performance. Extensive simulation experiments across diverse terrain scenarios demonstrate that our method outper- forms SOTA approaches in both accuracy and computational efficiency. Additionally, we develop an autonomous navigation framework integrated with the traversability map and validate it with a differential driven vehicle in complex outdoor envi- ronments. Our code will be open-source for further research and development by the community, https://github.com/ZJU-FAST-Lab/FSGP_BGK.

5.3ROMar 20, 2021Code

The Visual-Inertial-Dynamical Multirotor Dataset

Kunyi Zhang, Tiankai Yang, Ziming Ding et al.

Recently, the community has witnessed numerous datasets built for developing and testing state estimators. However, for some applications such as aerial transportation or search-and-rescue, the contact force or other disturbance must be perceived for robust planning and control, which is beyond the capacity of these datasets. This paper introduces a Visual-Inertial-Dynamical (VID) dataset, not only focusing on traditional six degrees of freedom (6-DOF) pose estimation but also providing dynamical characteristics of the flight platform for external force perception or dynamics-aided estimation. The VID dataset contains hardware synchronized imagery and inertial measurements, with accurate ground truth trajectories for evaluating common visual-inertial estimators. Moreover, the proposed dataset highlights rotor speed and motor current measurements, control inputs, and ground truth 6-axis force data to evaluate external force estimation. To the best of our knowledge, the proposed VID dataset is the first public dataset containing visual-inertial and complete dynamical information in the real world for pose and external force evaluation. The dataset: https://github.com/ZJU-FAST-Lab/VID-Dataset and related files: https://github.com/ZJU-FAST-Lab/VID-Flight-Platform are open-sourced.

29.0ROFeb 27, 2021Code

Geometrically Constrained Trajectory Optimization for Multicopters

Zhepei Wang, Xin Zhou, Chao Xu et al.

We present an optimization-based framework for multicopter trajectory planning subject to geometrical configuration constraints and user-defined dynamic constraints. The basis of the framework is a novel trajectory representation built upon our novel optimality conditions for unconstrained control effort minimization. We design linear-complexity operations on this representation to conduct spatial-temporal deformation under various planning requirements. Smooth maps are utilized to exactly eliminate geometrical constraints in a lightweight fashion. A variety of state-input constraints are supported by the decoupling of dense constraint evaluation from sparse parameterization, and backward differentiation of flatness map. As a result, this framework transforms a generally constrained multicopter planning problem into an unconstrained optimization that can be solved reliably and efficiently. Our framework bridges the gaps among solution quality, planning efficiency, and constraint fidelity for a multicopter with limited resources and maneuvering capability. Its generality and robustness are both demonstrated by applications to different flight tasks. Extensive simulations and benchmarks are also conducted to show its capability of generating high-quality solutions while retaining the computation speed against other specialized methods by orders of magnitude. The source code of our framework is available at: https://github.com/ZJU-FAST-Lab/GCOPTER

2.3DCJul 20, 2025

ACME: Adaptive Customization of Large Models via Distributed Systems

Ziming Dai, Chao Qiu, Fei Gao et al.

Pre-trained Transformer-based large models have revolutionized personal virtual assistants, but their deployment in cloud environments faces challenges related to data privacy and response latency. Deploying large models closer to the data and users has become a key research area to address these issues. However, applying these models directly often entails significant difficulties, such as model mismatching, resource constraints, and energy inefficiency. Automated design of customized models is necessary, but it faces three key challenges, namely, the high cost of centralized model customization, imbalanced performance from user heterogeneity, and suboptimal performance from data heterogeneity. In this paper, we propose ACME, an adaptive customization approach of Transformer-based large models via distributed systems. To avoid the low cost-efficiency of centralized methods, ACME employs a bidirectional single-loop distributed system to progressively achieve fine-grained collaborative model customization. In order to better match user heterogeneity, it begins by customizing the backbone generation and identifying the Pareto Front under model size constraints to ensure optimal resource utilization. Subsequently, it performs header generation and refines the model using data distribution-based personalized architecture aggregation to match data heterogeneity. Evaluation on different datasets shows that ACME achieves cost-efficient models under model size constraints. Compared to centralized systems, data transmission volume is reduced to 6 percent. Additionally, the average accuracy improves by 10 percent compared to the baseline, with the trade-off metrics increasing by nearly 30 percent.

34.6LGMar 31, 2024Code

Harnessing the Power of Large Language Model for Uncertainty Aware Graph Processing

Zhenyu Qian, Yiming Qian, Yuting Song et al.

Handling graph data is one of the most difficult tasks. Traditional techniques, such as those based on geometry and matrix factorization, rely on assumptions about the data relations that become inadequate when handling large and complex graph data. On the other hand, deep learning approaches demonstrate promising results in handling large graph data, but they often fall short of providing interpretable explanations. To equip the graph processing with both high accuracy and explainability, we introduce a novel approach that harnesses the power of a large language model (LLM), enhanced by an uncertainty-aware module to provide a confidence score on the generated answer. We experiment with our approach on two graph processing tasks: few-shot knowledge graph completion and graph classification. Our results demonstrate that through parameter efficient fine-tuning, the LLM surpasses state-of-the-art algorithms by a substantial margin across ten diverse benchmark datasets. Moreover, to address the challenge of explainability, we propose an uncertainty estimation based on perturbation, along with a calibration scheme to quantify the confidence scores of the generated answers. Our confidence measure achieves an AUC of 0.8 or higher on seven out of the ten datasets in predicting the correctness of the answer generated by LLM.

8.4CVSep 28, 2025

FastViDAR: Real-Time Omnidirectional Depth Estimation via Alternative Hierarchical Attention

Hangtian Zhao, Xiang Chen, Yizhe Li et al.

In this paper we propose FastViDAR, a novel framework that takes four fisheye camera inputs and produces a full $360^\circ$ depth map along with per-camera depth, fusion depth, and confidence estimates. Our main contributions are: (1) We introduce Alternative Hierarchical Attention (AHA) mechanism that efficiently fuses features across views through separate intra-frame and inter-frame windowed self-attention, achieving cross-view feature mixing with reduced overhead. (2) We propose a novel ERP fusion approach that projects multi-view depth estimates to a shared equirectangular coordinate system to obtain the final fusion depth. (3) We generate ERP image-depth pairs using HM3D and 2D3D-S datasets for comprehensive evaluation, demonstrating competitive zero-shot performance on real datasets while achieving up to 20 FPS on NVIDIA Orin NX embedded hardware. Project page: \href{https://3f7dfc.github.io/FastVidar/}{https://3f7dfc.github.io/FastVidar/}