RONov 3, 2025Code
LiDAR-VGGT: Cross-Modal Coarse-to-Fine Fusion for Globally Consistent and Metric-Scale Dense MappingLijie Wang, Lianjie Guo, Ziyi Xu et al.
Reconstructing large-scale colored point clouds is an important task in robotics, supporting perception, navigation, and scene understanding. Despite advances in LiDAR inertial visual odometry (LIVO), its performance remains highly sensitive to extrinsic calibration. Meanwhile, 3D vision foundation models, such as VGGT, suffer from limited scalability in large environments and inherently lack metric scale. To overcome these limitations, we propose LiDAR-VGGT, a novel framework that tightly couples LiDAR inertial odometry with the state-of-the-art VGGT model through a two-stage coarse- to-fine fusion pipeline: First, a pre-fusion module with robust initialization refinement efficiently estimates VGGT poses and point clouds with coarse metric scale within each session. Then, a post-fusion module enhances cross-modal 3D similarity transformation, using bounding-box-based regularization to reduce scale distortions caused by inconsistent FOVs between LiDAR and camera sensors. Extensive experiments across multiple datasets demonstrate that LiDAR-VGGT achieves dense, globally consistent colored point clouds and outperforms both VGGT-based methods and LIVO baselines. The implementation of our proposed novel color point cloud evaluation toolkit will be released as open source.
CVSep 28, 2025
FastViDAR: Real-Time Omnidirectional Depth Estimation via Alternative Hierarchical AttentionHangtian Zhao, Xiang Chen, Yizhe Li et al.
In this paper we propose FastViDAR, a novel framework that takes four fisheye camera inputs and produces a full $360^\circ$ depth map along with per-camera depth, fusion depth, and confidence estimates. Our main contributions are: (1) We introduce Alternative Hierarchical Attention (AHA) mechanism that efficiently fuses features across views through separate intra-frame and inter-frame windowed self-attention, achieving cross-view feature mixing with reduced overhead. (2) We propose a novel ERP fusion approach that projects multi-view depth estimates to a shared equirectangular coordinate system to obtain the final fusion depth. (3) We generate ERP image-depth pairs using HM3D and 2D3D-S datasets for comprehensive evaluation, demonstrating competitive zero-shot performance on real datasets while achieving up to 20 FPS on NVIDIA Orin NX embedded hardware. Project page: \href{https://3f7dfc.github.io/FastVidar/}{https://3f7dfc.github.io/FastVidar/}
ROSep 10, 2021
GPA-Teleoperation: Gaze Enhanced Perception-aware Safe Assistive Aerial TeleoperationQianhao Wang, Botao He, Zhiren Xun et al.
Gaze is an intuitive and direct way to represent the intentions of an individual. However, when it comes to assistive aerial teleoperation which aims to perform operators' intention, rare attention has been paid to gaze. Existing methods obtain intention directly from the remote controller (RC) input, which is inaccurate, unstable, and unfriendly to non-professional operators. Further, most teleoperation works do not consider environment perception which is vital to guarantee safety. In this paper, we present GPA-Teleoperation, a gaze enhanced perception-aware assistive teleoperation framework, which addresses the above issues systematically. We capture the intention utilizing gaze information, and generate a topological path matching it. Then we refine the path into a safe and feasible trajectory which simultaneously enhances the perception awareness to the environment operators are interested in. Additionally, the proposed method is integrated into a customized quadrotor system. Extensive challenging indoor and outdoor real-world experiments and benchmark comparisons verify that the proposed system is reliable, robust and applicable to even unskilled users. We will release the source code of our system to benefit related researches.
ROMar 11, 2021
Visibility-aware Trajectory Optimization with Application to Aerial TrackingQianhao Wang, Yuman Gao, Jialin Ji et al.
The visibility of targets determines performance and even success rate of various applications, such as active slam, exploration, and target tracking. Therefore, it is crucial to take the visibility of targets into explicit account in trajectory planning. In this paper, we propose a general metric for target visibility, considering observation distance and angle as well as occlusion effect. We formulate this metric into a differentiable visibility cost function, with which spatial trajectory and yaw can be jointly optimized. Furthermore, this visibility-aware trajectory optimization handles dynamic feasibility of position and yaw simultaneously. To validate that our method is practical and generic, we integrate it into a customized quadrotor tracking system. The experimental results show that our visibility-aware planner performs more robustly and observes targets better. In order to benefit related researches, we release our code to the public.
ROMar 10, 2021
Autonomous Flights in Dynamic Environments with Onboard VisionYingjian Wang, Jialin Ji, Qianhao Wang et al.
In this paper, we introduce a complete system for autonomous flight of quadrotors in dynamic environments with onboard sensing. Extended from existing work, we develop an occlusion-aware dynamic perception method based on depth images, which classifies obstacles as dynamic and static. For representing generic dynamic environment, we model dynamic objects with moving ellipsoids and fuse static ones into an occupancy grid map. To achieve dynamic avoidance, we design a planning method composed of modified kinodynamic path searching and gradient-based optimization. The method leverages manually constructed gradients without maintaining a signed distance field (SDF), making the planning procedure finished in milliseconds. We integrate the above methods into a customized quadrotor system and thoroughly test it in realworld experiments, verifying its effective collision avoidance in dynamic environments.
RONov 8, 2020
Learning-based 3D Occupancy Prediction for Autonomous Navigation in Occluded EnvironmentsLizi Wang, Hongkai Ye, Qianhao Wang et al.
In autonomous navigation of mobile robots, sensors suffer from massive occlusion in cluttered environments, leaving significant amount of space unknown during planning. In practice, treating the unknown space in optimistic or pessimistic ways both set limitations on planning performance, thus aggressiveness and safety cannot be satisfied at the same time. However, humans can infer the exact shape of the obstacles from only partial observation and generate non-conservative trajectories that avoid possible collisions in occluded space. Mimicking human behavior, in this paper, we propose a method based on deep neural network to predict occupancy distribution of unknown space reliably. Specifically, the proposed method utilizes contextual information of environments and learns from prior knowledge to predict obstacle distributions in occluded space. We use unlabeled and no-ground-truth data to train our network and successfully apply it to real-time navigation in unseen environments without any refinement. Results show that our method leverages the performance of a kinodynamic planner by improving security with no reduction of speed in clustered environments.
ROOct 17, 2020
Generating Large Convex Polytopes Directly on Point CloudsXingguang Zhong, Yuwei Wu, Dong Wang et al.
In this paper, we present a method to efficiently generate large, free, and guaranteed convex space among arbitrarily cluttered obstacles. Our method operates directly on point clouds, avoids expensive calculations, and processes thousands of points within a few milliseconds, which extremely suits embedded platforms. The base stone of our method is sphere flipping, a one-one invertible nonlinear transformation, which maps a set of unordered points to a nonlinear space. With these wrapped points, we obtain a collision-free star convex polytope. Then, utilizing the star convexity, we efficiently modify the polytope to convex and guarantee its free of obstacles. Extensive quantitative evaluations show that our method significantly outperforms state-of-the-art works in efficiency. We also present practical applications with our method in 3D, including large-scale deformable topological mapping and quadrotor optimal trajectory planning, to validate its capability and efficiency. The source code of our method will be released for the reference of the community.