Fu Zhang

RO
h-index26
44papers
5,258citations
Novelty53%
AI Score62

44 Papers

ROAug 26, 2024Code
FAST-LIVO2: Fast, Direct LiDAR-Inertial-Visual Odometry

Chunran Zheng, Wei Xu, Zuhao Zou et al.

This paper proposes FAST-LIVO2: a fast, direct LiDAR-inertial-visual odometry framework to achieve accurate and robust state estimation in SLAM tasks and provide great potential in real-time, onboard robotic applications. FAST-LIVO2 fuses the IMU, LiDAR and image measurements efficiently through an ESIKF. To address the dimension mismatch between the heterogeneous LiDAR and image measurements, we use a sequential update strategy in the Kalman filter. To enhance the efficiency, we use direct methods for both the visual and LiDAR fusion, where the LiDAR module registers raw points without extracting edge or plane features and the visual module minimizes direct photometric errors without extracting ORB or FAST corner features. The fusion of both visual and LiDAR measurements is based on a single unified voxel map where the LiDAR module constructs the geometric structure for registering new LiDAR scans and the visual module attaches image patches to the LiDAR points. To enhance the accuracy of image alignment, we use plane priors from the LiDAR points in the voxel map (and even refine the plane prior) and update the reference patch dynamically after new images are aligned. Furthermore, to enhance the robustness of image alignment, FAST-LIVO2 employs an on-demanding raycast operation and estimates the image exposure time in real time. Lastly, we detail three applications of FAST-LIVO2: UAV onboard navigation demonstrating the system's computation efficiency for real-time onboard navigation, airborne mapping showcasing the system's mapping accuracy, and 3D model rendering (mesh-based and NeRF-based) underscoring the suitability of our reconstructed dense map for subsequent rendering tasks. We open source our code, dataset and application on GitHub to benefit the robotics community.

CVSep 26, 2022Code
STD: Stable Triangle Descriptor for 3D place recognition

Chongjian Yuan, Jiarong Lin, Zuhao Zou et al.

In this work, we present a novel global descriptor termed stable triangle descriptor (STD) for 3D place recognition. For a triangle, its shape is uniquely determined by the length of the sides or included angles. Moreover, the shape of triangles is completely invariant to rigid transformations. Based on this property, we first design an algorithm to efficiently extract local key points from the 3D point cloud and encode these key points into triangular descriptors. Then, place recognition is achieved by matching the side lengths (and some other information) of the descriptors between point clouds. The point correspondence obtained from the descriptor matching pair can be further used in geometric verification, which greatly improves the accuracy of place recognition. In our experiments, we extensively compare our proposed system against other state-of-the-art systems (i.e., M2DP, Scan Context) on public datasets (i.e., KITTI, NCLT, and Complex-Urban) and our self-collected dataset (with a non-repetitive scanning solid-state LiDAR). All the quantitative results show that STD has stronger adaptability and a great improvement in precision over its counterparts. To share our findings and make contributions to the community, we open source our code on our GitHub: https://github.com/hku-mars/STD.

ROJan 12, 2023Code
ImMesh: An Immediate LiDAR Localization and Meshing Framework

Jiarong Lin, Chongjiang Yuan, Yixi Cai et al.

In this paper, we propose a novel LiDAR(-inertial) odometry and mapping framework to achieve the goal of simultaneous localization and meshing in real-time. This proposed framework termed ImMesh comprises four tightly-coupled modules: receiver, localization, meshing, and broadcaster. The localization module utilizes the prepossessed sensor data from the receiver, estimates the sensor pose online by registering LiDAR scans to maps, and dynamically grows the map. Then, our meshing module takes the registered LiDAR scan for incrementally reconstructing the triangle mesh on the fly. Finally, the real-time odometry, map, and mesh are published via our broadcaster. The key contribution of this work is the meshing module, which represents a scene by an efficient hierarchical voxels structure, performs fast finding of voxels observed by new scans, and reconstructs triangle facets in each voxel in an incremental manner. This voxel-wise meshing operation is delicately designed for the purpose of efficiency; it first performs a dimension reduction by projecting 3D points to a 2D local plane contained in the voxel, and then executes the meshing operation with pull, commit and push steps for incremental reconstruction of triangle facets. To the best of our knowledge, this is the first work in literature that can reconstruct online the triangle mesh of large-scale scenes, just relying on a standard CPU without GPU acceleration. To share our findings and make contributions to the community, we make our code publicly available on our GitHub: https://github.com/hku-mars/ImMesh.

CVSep 8, 2022Code
R$^3$LIVE++: A Robust, Real-time, Radiance reconstruction package with a tightly-coupled LiDAR-Inertial-Visual state Estimator

Jiarong Lin, Fu Zhang

Simultaneous localization and mapping (SLAM) are crucial for autonomous robots (e.g., self-driving cars, autonomous drones), 3D mapping systems, and AR/VR applications. This work proposed a novel LiDAR-inertial-visual fusion framework termed R$^3$LIVE++ to achieve robust and accurate state estimation while simultaneously reconstructing the radiance map on the fly. R$^3$LIVE++ consists of a LiDAR-inertial odometry (LIO) and a visual-inertial odometry (VIO), both running in real-time. The LIO subsystem utilizes the measurements from a LiDAR for reconstructing the geometric structure (i.e., the positions of 3D points), while the VIO subsystem simultaneously recovers the radiance information of the geometric structure from the input images. R$^3$LIVE++ is developed based on R$^3$LIVE and further improves the accuracy in localization and mapping by accounting for the camera photometric calibration (e.g., non-linear response function and lens vignetting) and the online estimation of camera exposure time. We conduct more extensive experiments on both public and our private datasets to compare our proposed system against other state-of-the-art SLAM systems. Quantitative and qualitative results show that our proposed system has significant improvements over others in both accuracy and robustness. In addition, to demonstrate the extendability of our work, {we developed several applications based on our reconstructed radiance maps, such as high dynamic range (HDR) imaging, virtual environment exploration, and 3D video gaming.} Lastly, to share our findings and make contributions to the community, we make our codes, hardware design, and dataset publicly available on our Github: github.com/hku-mars/r3live

ROSep 9, 2024Code
Neural Surface Reconstruction and Rendering for LiDAR-Visual Systems

Jianheng Liu, Chunran Zheng, Yunfei Wan et al.

This paper presents a unified surface reconstruction and rendering framework for LiDAR-visual systems, integrating Neural Radiance Fields (NeRF) and Neural Distance Fields (NDF) to recover both appearance and structural information from posed images and point clouds. We address the structural visible gap between NeRF and NDF by utilizing a visible-aware occupancy map to classify space into the free, occupied, visible unknown, and background regions. This classification facilitates the recovery of a complete appearance and structure of the scene. We unify the training of the NDF and NeRF using a spatial-varying scale SDF-to-density transformation for levels of detail for both structure and appearance. The proposed method leverages the learned NDF for structure-aware NeRF training by an adaptive sphere tracing sampling strategy for accurate structure rendering. In return, NeRF further refines structural in recovering missing or fuzzy structures in the NDF. Extensive experiments demonstrate the superior quality and versatility of the proposed method across various scenarios. To benefit the community, the codes will be released at \url{https://github.com/hku-mars/M2Mapping}.

SYMar 15, 2019
Full Attitude Control of an Efficient Quadrotor Tail-sitter VTOL UAV with Flexible Modes

Wei Xu, Haowei Gu, Youming Qing et al.

In this paper, we present a full attitude control of an efficient quadrotor tail-sitter VTOL UAV with flexible modes. This control system is working in all flight modes without any control surfaces but motor differential thrusts. This paper concentrates on the design of the attitude controller and the altitude controller. For the attitude control, the controller's parameters and filters are optimized based on the frequency response model which is identified from the sweep experiment. As a result, the effect of system flexible modes is easily compensated in frequency-domain by using a notch filter, and the resulting attitude loop shows superior tracking performance and robustness. In the coordinated flight condition, the altitude controller is structured as the feedforward-feedback parallel controller. The feedforward thrust command is calculated based on the current speed and the pitch angle. Tests in hovering, forward accelerating and forward decelerating flights have been conducted to verify the proposed control system.

41.1ROMar 23Code
Memory-Efficient Boundary Map for Large-Scale Occupancy Grid Mapping

Benxu Tang, Yunfan Ren, Yixi Cai et al.

Determining the occupancy status of locations in the environment is a fundamental task for safety-critical robotic applications. Traditional occupancy grid mapping methods subdivide the environment into a grid of voxels, each associated with one of three occupancy states: free, occupied, or unknown. These methods explicitly maintain all voxels within the mapped volume and determine the occupancy state of a location by directly querying the corresponding voxel that the location falls within. However, maintaining all grid voxels in high-resolution and large-scale scenarios requires substantial memory resources. In this paper, we introduce a novel representation that only maintains the boundary of the mapped volume. Specifically, we explicitly represent the boundary voxels, such as the occupied voxels and frontier voxels, while free and unknown voxels are automatically represented by volumes within or outside the boundary, respectively. As our representation maintains only a closed surface in two-dimensional (2D) space, instead of the entire volume in three-dimensional (3D) space, it significantly reduces memory consumption. Then, based on this 2D representation, we propose a method to determine the occupancy state of arbitrary locations in the 3D environment. We term this method as boundary map. Besides, we design a novel data structure for maintaining the boundary map, supporting efficient occupancy state queries. Theoretical analyses of the occupancy state query algorithm are also provided. Furthermore, to enable efficient construction and updates of the boundary map from the real-time sensor measurements, we propose a global-local mapping framework and corresponding update algorithms. Finally, we will make our implementation of the boundary map open-source on GitHub to benefit the community:https://github.com/hku-mars/BDM.

CVAug 21, 2024
Visual Localization in 3D Maps: Comparing Point Cloud, Mesh, and NeRF Representations

Lintong Zhang, Yifu Tao, Jiarong Lin et al.

Recent advances in mapping techniques have enabled the creation of highly accurate dense 3D maps during robotic missions, such as point clouds, meshes, or NeRF-based representations. These developments present new opportunities for reusing these maps for localization. However, there remains a lack of a unified approach that can operate seamlessly across different map representations. This paper presents and evaluates a global visual localization system capable of localizing a single camera image across various 3D map representations built using both visual and lidar sensing. Our system generates a database by synthesizing novel views of the scene, creating RGB and depth image pairs. Leveraging the precise 3D geometric map, our method automatically defines rendering poses, reducing the number of database images while preserving retrieval performance. To bridge the domain gap between real query camera images and synthetic database images, our approach utilizes learning-based descriptors and feature detectors. We evaluate the system's performance through extensive real-world experiments conducted in both indoor and outdoor settings, assessing the effectiveness of each map representation and demonstrating its advantages over traditional structure-from-motion (SfM) localization approaches. The results show that all three map representations can achieve consistent localization success rates of 55% and higher across various environments. NeRF synthesized images show superior performance, localizing query images at an average success rate of 72%. Furthermore, we demonstrate an advantage over SfM-based approaches that our synthesized database enables localization in the reverse travel direction which is unseen during the mapping process. Our system, operating in real-time on a mobile laptop equipped with a GPU, achieves a processing rate of 1Hz.

21.1ROApr 14
D-BDM: A Direct and Efficient Boundary-Based Occupancy Grid Mapping Framework for LiDARs

Benxu Tang, Yixi Cai, Fanze Kong et al.

Efficient and scalable 3D occupancy mapping is essential for autonomous robot applications in unknown environments. However, traditional occupancy grid representations suffer from two fundamental limitations. First, explicitly storing all voxels in three-dimensional space leads to prohibitive memory consumption. Second, exhaustive ray casting incurs high update latency. A recent representation alleviate memory demands by maintaining only the voxels on the two-dimensional boundary, yet they still rely on full ray casting updates. This work advances the boundary-based framework with a highly efficient update scheme. We introduce a truncated ray casting strategy that restricts voxel traversal to the exterior of the boundary, which dramatically reduces the number of updated voxels. In addition, we propose a direct boundary update mechanism that removes the need for an auxiliary local 3D occupancy grid, further reducing memory usage and simplifying the map update pipeline. We name our framework as D-BDM. Extensive evaluations on public datasets demonstrate that our approach achieves significantly lower update time and reduced memory consumption compared with the baseline methods, as well as the prior boundary-based approach.

CVApr 29, 2023
An Efficient Plane Extraction Approach for Bundle Adjustment on LiDAR Point clouds

Zheng Liu, Fu Zhang

Bundle adjustment (BA) on LiDAR point clouds has been extensively investigated in recent years due to its ability to optimize multiple poses together, resulting in high accuracy and global consistency for point cloud. However, the accuracy and speed of LiDAR bundle adjustment depend on the quality of plane extraction, which provides point association for LiDAR BA. In this study, we propose a novel and efficient voxel-based approach for plane extraction that is specially designed to provide point association for LiDAR bundle adjustment. To begin, we partition the space into multiple voxels of a fixed size and then split these root voxels based on whether the points are on the same plane, using an octree structure. We also design a novel plane determination method based on principle component analysis (PCA), which segments the points into four even quarters and compare their minimum eigenvalues with that of the initial point cloud. Finally, we adopt a plane merging method to prevent too many small planes from being in a single voxel, which can increase the optimization time required for BA. Our experimental results on HILTI demonstrate that our approach achieves the best precision and least time cost compared to other plane extraction methods.

ROMar 13, 2025Code
GS-SDF: LiDAR-Augmented Gaussian Splatting and Neural SDF for Geometrically Consistent Rendering and Reconstruction

Jianheng Liu, Yunfei Wan, Bowen Wang et al.

Digital twins are fundamental to the development of autonomous driving and embodied artificial intelligence. However, achieving high-granularity surface reconstruction and high-fidelity rendering remains a challenge. Gaussian splatting offers efficient photorealistic rendering but struggles with geometric inconsistencies due to fragmented primitives and sparse observational data in robotics applications. Existing regularization methods, which rely on render-derived constraints, often fail in complex environments. Moreover, effectively integrating sparse LiDAR data with Gaussian splatting remains challenging. We propose a unified LiDAR-visual system that synergizes Gaussian splatting with a neural signed distance field. The accurate LiDAR point clouds enable a trained neural signed distance field to offer a manifold geometry field. This motivates us to offer an SDF-based Gaussian initialization for physically grounded primitive placement and a comprehensive geometric regularization for geometrically consistent rendering and reconstruction. Experiments demonstrate superior reconstruction accuracy and rendering quality across diverse trajectories. To benefit the community, the codes are released at https://github.com/hku-mars/GS-SDF.

CLApr 18, 2024Code
NALA: an Effective and Interpretable Entity Alignment Method

Chuanhao Xu, Jingwei Cheng, Fu Zhang

Entity alignment (EA) aims to find equivalent entities between two Knowledge Graphs. Existing embedding-based EA methods usually encode entities as embeddings, triples as embeddings' constraint and learn to align the embeddings. However, the details of the underlying logical inference steps among the alignment process are usually omitted, resulting in inadequate inference process. In this paper, we introduce NALA, an entity alignment method that captures three types of logical inference paths with Non-Axiomatic Logic (NAL). Type 1&2 align the entity pairs and type 3 aligns relations. NALA iteratively aligns entities and relations by integrating the conclusions of the inference paths. Our method is logically interpretable and extensible by introducing NAL, and thus suitable for various EA settings. Experimental results show that NALA outperforms state-of-the-art methods in terms of Hits@1, achieving 0.98+ on all three datasets of DBP15K with both supervised and unsupervised settings. We offer a pioneering in-depth analysis of the fundamental principles of entity alignment, approaching the subject from a unified and logical perspective. Our code is available at https://github.com/13998151318/NALA.

CVFeb 5
Sparse Video Generation Propels Real-World Beyond-the-View Vision-Language Navigation

Hai Zhang, Siqi Liang, Li Chen et al.

Why must vision-language navigation be bound to detailed and verbose language instructions? While such details ease decision-making, they fundamentally contradict the goal for navigation in the real-world. Ideally, agents should possess the autonomy to navigate in unknown environments guided solely by simple and high-level intents. Realizing this ambition introduces a formidable challenge: Beyond-the-View Navigation (BVN), where agents must locate distant, unseen targets without dense and step-by-step guidance. Existing large language model (LLM)-based methods, though adept at following dense instructions, often suffer from short-sighted behaviors due to their reliance on short-horimzon supervision. Simply extending the supervision horizon, however, destabilizes LLM training. In this work, we identify that video generation models inherently benefit from long-horizon supervision to align with language instructions, rendering them uniquely suitable for BVN tasks. Capitalizing on this insight, we propose introducing the video generation model into this field for the first time. Yet, the prohibitive latency for generating videos spanning tens of seconds makes real-world deployment impractical. To bridge this gap, we propose SparseVideoNav, achieving sub-second trajectory inference guided by a generated sparse future spanning a 20-second horizon. This yields a remarkable 27x speed-up compared to the unoptimized counterpart. Extensive real-world zero-shot experiments demonstrate that SparseVideoNav achieves 2.5x the success rate of state-of-the-art LLM baselines on BVN tasks and marks the first realization of such capability in challenging night scenes.

CVDec 22, 2025
6DAttack: Backdoor Attacks in the 6DoF Pose Estimation

Jihui Guo, Zongmin Zhang, Zhen Sun et al.

Deep learning advances have enabled accurate six-degree-of-freedom (6DoF) object pose estimation, widely used in robotics, AR/VR, and autonomous systems. However, backdoor attacks pose significant security risks. While most research focuses on 2D vision, 6DoF pose estimation remains largely unexplored. Unlike traditional backdoors that only change classes, 6DoF attacks must control continuous parameters like translation and rotation, rendering 2D methods inapplicable. We propose 6DAttack, a framework using 3D object triggers to induce controlled erroneous poses while maintaining normal behavior. Evaluations on PVNet, DenseFusion, and PoseDiffusion across LINEMOD, YCB-Video, and CO3D show high attack success rates (ASRs) without compromising clean performance. Backdoored models achieve up to 100% clean ADD accuracy and 100% ASR, with triggered samples reaching 97.70% ADD-P. Furthermore, a representative defense remains ineffective. Our findings reveal a serious, underexplored threat to 6DoF pose estimation.

CVApr 28, 2025Code
Mesh-Learner: Texturing Mesh with Spherical Harmonics

Yunfei Wan, Jianheng Liu, Chunran Zheng et al.

In this paper, we present a 3D reconstruction and rendering framework termed Mesh-Learner that is natively compatible with traditional rasterization pipelines. It integrates mesh and spherical harmonic (SH) texture (i.e., texture filled with SH coefficients) into the learning process to learn each mesh s view-dependent radiance end-to-end. Images are rendered by interpolating surrounding SH Texels at each pixel s sampling point using a novel interpolation method. Conversely, gradients from each pixel are back-propagated to the related SH Texels in SH textures. Mesh-Learner exploits graphic features of rasterization pipeline (texture sampling, deferred rendering) to render, which makes Mesh-Learner naturally compatible with tools (e.g., Blender) and tasks (e.g., 3D reconstruction, scene rendering, reinforcement learning for robotics) that are based on rasterization pipelines. Our system can train vast, unlimited scenes because we transfer only the SH textures within the frustum to the GPU for training. At other times, the SH textures are stored in CPU RAM, which results in moderate GPU memory usage. The rendering results on interpolation and extrapolation sequences in the Replica and FAST-LIVO2 datasets achieve state-of-the-art performance compared to existing state-of-the-art methods (e.g., 3D Gaussian Splatting and M2-Mapping). To benefit the society, the code will be available at https://github.com/hku-mars/Mesh-Learner.

ROFeb 22, 2022Code
Robust Real-time LiDAR-inertial Initialization

Fangcheng Zhu, Yunfan Ren, Fu Zhang

For most LiDAR-inertial odometry, accurate initial states, including temporal offset and extrinsic transformation between LiDAR and 6-axis IMUs, play a significant role and are often considered as prerequisites. However, such information may not be always available in customized LiDAR-inertial systems. In this paper, we propose LI-Init: a full and real-time LiDAR-inertial system initialization process that calibrates the temporal offset and extrinsic parameter between LiDARs and IMUs, and also the gravity vector and IMU bias by aligning the state estimated from LiDAR measurements with that measured by IMU. We implement the proposed method as an initialization module, which, if enabled, automatically detects the degree of excitation of the collected data and calibrate, on-the-fly, the temporal offset, extrinsic, gravity vector, and IMU bias, which are then used as high-quality initial state values for real-time LiDAR-inertial odometry systems. Experiments conducted with different types of LiDARs and LiDAR-inertial combinations show the robustness, adaptability and efficiency of our initialization method. The implementation of our LiDAR-inertial initialization procedure LI-Init and test data are open-sourced on Github and also integrated into a state-of-the-art LiDAR-inertial odometry system FAST-LIO2.

ROSep 15, 2021Code
Efficient and Probabilistic Adaptive Voxel Mapping for Accurate Online LiDAR Odometry

Chongjian Yuan, Wei xu, Xiyuan Liu et al.

This paper proposes an efficient and probabilistic adaptive voxel mapping method for LiDAR odometry. The map is a collection of voxels; each contains one plane (or edge) feature that enables the probabilistic representation of the environment and accurate registration of a new LiDAR scan. We further analyze the need for coarse-to-fine voxel mapping and then use a novel voxel map organized by a Hash table and octrees to build and update the map efficiently. We apply the proposed voxel map to an iterated extended Kalman filter and construct a maximum a posteriori probability problem for pose estimation. Experiments on the open KITTI dataset show the high accuracy and efficiency of our method compared to other state-of-the-art methods. Outdoor experiments on unstructured environments with non-repetitive scanning LiDARs further verify the adaptability of our mapping method to different environments and LiDAR scanning patterns. Our codes and dataset are open-sourced on Github

ROSep 10, 2021Code
R3LIVE: A Robust, Real-time, RGB-colored, LiDAR-Inertial-Visual tightly-coupled state Estimation and mapping package

Jiarong Lin, Fu Zhang

In this letter, we propose a novel LiDAR-Inertial-Visual sensor fusion framework termed R3LIVE, which takes advantage of measurement of LiDAR, inertial, and visual sensors to achieve robust and accurate state estimation. R3LIVE is contained of two subsystems, the LiDAR-inertial odometry (LIO) and visual-inertial odometry (VIO). The LIO subsystem (FAST-LIO) takes advantage of the measurement from LiDAR and inertial sensors and builds the geometry structure of (i.e. the position of 3D points) global maps. The VIO subsystem utilizes the data of visual-inertial sensors and renders the map's texture (i.e. the color of 3D points). More specifically, the VIO subsystem fuses the visual data directly and effectively by minimizing the frame-to-map photometric error. The developed system R3LIVE is developed based on our previous work R2LIVE, with careful architecture design and implementation. Experiment results show that the resultant system achieves more robustness and higher accuracy in state estimation than current counterparts (see our attached video). R3LIVE is a versatile and well-engineered system toward various possible applications, which can not only serve as a SLAM system for real-time robotic applications, but can also reconstruct the dense, precise, RGB-colored 3D maps for applications like surveying and mapping. Moreover, to make R3LIVE more extensible, we develop a series of offline utilities for reconstructing and texturing meshes, which further minimizes the gap between R3LIVE and various of 3D applications such as simulators, video games and etc (see our demos video). To share our findings and make contributions to the community, we open source R3LIVE on our Github, including all of our codes, software utilities, and the mechanical design of our device.

ROJul 14, 2021Code
FAST-LIO2: Fast Direct LiDAR-inertial Odometry

Wei Xu, Yixi Cai, Dongjiao He et al.

This paper presents FAST-LIO2: a fast, robust, and versatile LiDAR-inertial odometry framework. Building on a highly efficient tightly-coupled iterated Kalman filter, FAST-LIO2 has two key novelties that allow fast, robust, and accurate LiDAR navigation (and mapping). The first one is directly registering raw points to the map (and subsequently update the map, i.e., mapping) without extracting features. This enables the exploitation of subtle features in the environment and hence increases the accuracy. The elimination of a hand-engineered feature extraction module also makes it naturally adaptable to emerging LiDARs of different scanning patterns; The second main novelty is maintaining a map by an incremental k-d tree data structure, ikd-Tree, that enables incremental updates (i.e., point insertion, delete) and dynamic re-balancing. Compared with existing dynamic data structures (octree, R*-tree, nanoflann k-d tree), ikd-Tree achieves superior overall performance while naturally supports downsampling on the tree. We conduct an exhaustive benchmark comparison in 19 sequences from a variety of open LiDAR datasets. FAST-LIO2 achieves consistently higher accuracy at a much lower computation load than other state-of-the-art LiDAR-inertial navigation systems. Various real-world experiments on solid-state LiDARs with small FoV are also conducted. Overall, FAST-LIO2 is computationally-efficient (e.g., up to 100 Hz odometry and mapping in large outdoor environments), robust (e.g., reliable pose estimation in cluttered indoor environments with rotation up to 1000 deg/s), versatile (i.e., applicable to both multi-line spinning and solid-state LiDARs, UAV and handheld platforms, and Intel and ARM-based processors), while still achieving higher accuracy than existing methods. Our implementation of the system FAST-LIO2, and the data structure ikd-Tree are both open-sourced on Github.

ROMar 2, 2021Code
Pixel-level Extrinsic Self Calibration of High Resolution LiDAR and Camera in Targetless Environments

Chongjian Yuan, Xiyuan Liu, Xiaoping Hong et al.

In this letter, we present a novel method for automatic extrinsic calibration of high-resolution LiDARs and RGB cameras in targetless environments. Our approach does not require checkerboards but can achieve pixel-level accuracy by aligning natural edge features in the two sensors. On the theory level, we analyze the constraints imposed by edge features and the sensitivity of calibration accuracy with respect to edge distribution in the scene. On the implementation level, we carefully investigate the physical measuring principles of LiDARs and propose an efficient and accurate LiDAR edge extraction method based on point cloud voxel cutting and plane fitting. Due to the edges' richness in natural scenes, we have carried out experiments in many indoor and outdoor scenes. The results show that this method has high robustness, accuracy, and consistency. It can promote the research and application of the fusion between LiDAR and camera. We have open-sourced our code on GitHub to benefit the community.

ROFeb 28, 2021Code
Avoiding dynamic small obstacles with onboard sensing and computating on aerial robots

Fanze Kong, Wei Xu, Fu Zhang

In practical applications, autonomous quadrotors are still facing significant challenges, such as the detection and avoidance of very small and even dynamic obstacles (e.g., tree branches, power lines). In this paper, we propose a compact, integrated, and fully autonomous quadrotor system, which can fly safely in cluttered environments while avoiding dynamic small obstacles. Our quadrotor platform is equipped with a forward-looking three-dimensional (3D) light detection and ranging (lidar) sensor to perceive the environment and an onboard embedded computer to perform all the estimation, mapping, and planning tasks. Specifically, the computer estimates the current pose of the UAV, maintains a local map (time-accumulated point clouds KD-Trees), and computes a safe trajectory using kinodynamic A* search to the goal point. The whole perception and planning system can run onboard at 50Hz with careful optimization. Various indoor and outdoor experiments show that the system can avoid dynamic small obstacles (down to 20mm diameter bar) while flying at 2m/s in cluttered environments. Our codes and hardware design are open-sourced on Github.

ROFeb 24, 2021Code
R2LIVE: A Robust, Real-time, LiDAR-Inertial-Visual tightly-coupled state Estimator and mapping

Jiarong Lin, Chunran Zheng, Wei Xu et al.

In this letter, we propose a robust, real-time tightly-coupled multi-sensor fusion framework, which fuses measurement from LiDAR, inertial sensor, and visual camera to achieve robust and accurate state estimation. Our proposed framework is composed of two parts: the filter-based odometry and factor graph optimization. To guarantee real-time performance, we estimate the state within the framework of error-state iterated Kalman-filter, and further improve the overall precision with our factor graph optimization. Taking advantage of measurement from all individual sensors, our algorithm is robust enough to various visual failure, LiDAR-degenerated scenarios, and is able to run in real-time on an on-board computation platform, as shown by extensive experiments conducted in indoor, outdoor, and mixed environment of different scale. Moreover, the results show that our proposed framework can improve the accuracy of state-of-the-art LiDAR-inertial or visual-inertial odometry. To share our findings and to make contributions to the community, we open source our codes on our Github.

ROFeb 7, 2021Code
Kalman Filters on Differentiable Manifolds

Dongjiao He, Wei Xu, Fu Zhang

Kalman filter is presumably one of the most important and extensively used filtering techniques in modern control systems. Yet, nearly all current variants of Kalman filters are formulated in the Euclidean space $\mathbb{R}^n$, while many real-world systems (e.g., robotic systems) are really evolving on manifolds. In this paper, we propose a method to develop Kalman filters for such on-manifold systems. Utilizing $\boxplus$, $\boxminus$ operations and further defining an oplus operation on the respective manifold, we propose a canonical representation of the on-manifold system. Such a canonical form enables us to separate the manifold constraints from the system behaviors in each step of the Kalman filter, ultimately leading to a generic and symbolic Kalman filter framework that are naturally evolving on the manifold. Furthermore, the on-manifold Kalman filter is implemented as a toolkit in $C$++ packages which enables users to implement an on-manifold Kalman filter just like the normal one in $\mathbb{R}^n$: the user needs only to provide the system-specific descriptions, and then call the respective filter steps (e.g., predict, update) without dealing with any of the manifold constraints. The existing implementation supports full iterated Kalman filtering for systems on any manifold composed of $\mathbb{R}^n$, $SO(3)$ and $\mathbb{S}^2$, and is extendable to other types of manifold when necessary. The proposed symbolic Kalman filter and the developed toolkit are verified by implementing a tightly-coupled lidar-inertial navigation system. Results show that the developed toolkit leads to superior filtering performances and computation efficiency comparable to hand-engineered counterparts. Finally, the toolkit is opened sourced at https://github.com/hku-mars/IKFoM to assist practitioners to quickly deploy an on-manifold Kalman filter.

ROOct 16, 2020Code
BALM: Bundle Adjustment for Lidar Mapping

Zheng Liu, Fu Zhang

A local Bundle Adjustment (BA) on a sliding window of keyframes has been widely used in visual SLAM and proved to be very effective in lowering the drift. But in lidar SLAM, BA method is hardly used because the sparse feature points (e.g., edge and plane) make the exact point matching impossible. In this paper, we formulate the lidar BA as minimizing the distance from a feature point to its matched edge or plane. Unlike the visual SLAM (and prior plane adjustment method in lidar SLAM) where the feature has to be co-determined along with the pose, we show that the feature can be analytically solved and removed from the BA, the resultant BA is only dependent on the scan poses. This greatly reduces the optimization scale and allows large-scale dense plane and edge features to be used. To speedup the optimization, we derive the analytical derivatives of the cost function, up to second order, in closed form. Moreover, we propose a novel adaptive voxelization method to search feature correspondence efficiently. The proposed formulations are incorporated into a LOAM back-end for map refinement. Results show that, although as a back-end, the local BA can be solved very efficiently, even in real-time at 10Hz when optimizing 20 scans of point-cloud. The local BA also considerably lowers the LOAM drift. Our implementation of the BA optimization and LOAM are open-sourced to benefit the community.

ROOct 16, 2020Code
FAST-LIO: A Fast, Robust LiDAR-inertial Odometry Package by Tightly-Coupled Iterated Kalman Filter

Wei Xu, Fu Zhang

This paper presents a computationally efficient and robust LiDAR-inertial odometry framework. We fuse LiDAR feature points with IMU data using a tightly-coupled iterated extended Kalman filter to allow robust navigation in fast-motion, noisy or cluttered environments where degeneration occurs. To lower the computation load in the presence of large number of measurements, we present a new formula to compute the Kalman gain. The new formula has computation load depending on the state dimension instead of the measurement dimension. The proposed method and its implementation are tested in various indoor and outdoor environments. In all tests, our method produces reliable navigation results in real-time: running on a quadrotor onboard computer, it fuses more than 1,200 effective feature points in a scan and completes all iterations of an iEKF step within 25 ms. Our codes are open-sourced on Github.

ROSep 25, 2019Code
A fast, complete, point cloud based loop closure for LiDAR odometry and mapping

Jiarong Lin, Fu Zhang

This paper presents a loop closure method to correct the long-term drift in LiDAR odometry and mapping (LOAM). Our proposed method computes the 2D histogram of keyframes, a local map patch, and uses the normalized cross-correlation of the 2D histograms as the similarity metric between the current keyframe and those in the map. We show that this method is fast, invariant to rotation, and produces reliable and accurate loop detection. The proposed method is implemented with careful engineering and integrated into the LOAM algorithm, forming a complete and practical system ready to use. To benefit the community by serving a benchmark for loop closure, the entire system is made open source on Github

ROSep 15, 2019Code
Loam_livox: A fast, robust, high-precision LiDAR odometry and mapping package for LiDARs of small FoV

Jiarong Lin, Fu Zhang

LiDAR odometry and mapping (LOAM) has been playing an important role in autonomous vehicles, due to its ability to simultaneously localize the robot's pose and build high-precision, high-resolution maps of the surrounding environment. This enables autonomous navigation and safe path planning of autonomous vehicles. In this paper, we present a robust, real-time LOAM algorithm for LiDARs with small FoV and irregular samplings. By taking effort on both front-end and back-end, we address several fundamental challenges arising from such LiDARs, and achieve better performance in both precision and efficiency compared to existing baselines. To share our findings and to make contributions to the community, we open source our codes on Github

CLOct 17, 2024
Attr-Int: A Simple and Effective Entity Alignment Framework for Heterogeneous Knowledge Graphs

Linyan Yang, Jingwei Cheng, Chuanhao Xu et al. · pku

Entity alignment (EA) refers to the task of linking entities in different knowledge graphs (KGs). Existing EA methods rely heavily on structural isomorphism. However, in real-world KGs, aligned entities usually have non-isomorphic neighborhood structures, which paralyses the application of these structure-dependent methods. In this paper, we investigate and tackle the problem of entity alignment between heterogeneous KGs. First, we propose two new benchmarks to closely simulate real-world EA scenarios of heterogeneity. Then we conduct extensive experiments to evaluate the performance of representative EA methods on the new benchmarks. Finally, we propose a simple and effective entity alignment framework called Attr-Int, in which innovative attribute information interaction methods can be seamlessly integrated with any embedding encoder for entity alignment, improving the performance of existing entity alignment techniques. Experiments demonstrate that our framework outperforms the state-of-the-art approaches on two new benchmarks.

ROJan 15, 2025
GS-LIVO: Real-Time LiDAR, Inertial, and Visual Multi-sensor Fused Odometry with Gaussian Mapping

Sheng Hong, Chunran Zheng, Yishu Shen et al.

In recent years, 3D Gaussian splatting (3D-GS) has emerged as a novel scene representation approach. However, existing vision-only 3D-GS methods often rely on hand-crafted heuristics for point-cloud densification and face challenges in handling occlusions and high GPU memory and computation consumption. LiDAR-Inertial-Visual (LIV) sensor configuration has demonstrated superior performance in localization and dense mapping by leveraging complementary sensing characteristics: rich texture information from cameras, precise geometric measurements from LiDAR, and high-frequency motion data from IMU. Inspired by this, we propose a novel real-time Gaussian-based simultaneous localization and mapping (SLAM) system. Our map system comprises a global Gaussian map and a sliding window of Gaussians, along with an IESKF-based odometry. The global Gaussian map consists of hash-indexed voxels organized in a recursive octree, effectively covering sparse spatial volumes while adapting to different levels of detail and scales. The Gaussian map is initialized through multi-sensor fusion and optimized with photometric gradients. Our system incrementally maintains a sliding window of Gaussians, significantly reducing GPU computation and memory consumption by only optimizing the map within the sliding window. Moreover, we implement a tightly coupled multi-sensor fusion odometry with an iterative error state Kalman filter (IESKF), leveraging real-time updating and rendering of the Gaussian map. Our system represents the first real-time Gaussian-based SLAM framework deployable on resource-constrained embedded systems, demonstrated on the NVIDIA Jetson Orin NX platform. The framework achieves real-time performance while maintaining robust multi-sensor fusion capabilities. All implementation algorithms, hardware designs, and CAD models will be publicly available.

AIOct 17, 2025
Multi-dimensional Data Analysis and Applications Basing on LLM Agents and Knowledge Graph Interactions

Xi Wang, Xianyao Ling, Kun Li et al.

In the current era of big data, extracting deep insights from massive, heterogeneous, and complexly associated multi-dimensional data has become a significant challenge. Large Language Models (LLMs) perform well in natural language understanding and generation, but still suffer from "hallucination" issues when processing structured knowledge and are difficult to update in real-time. Although Knowledge Graphs (KGs) can explicitly store structured knowledge, their static nature limits dynamic interaction and analytical capabilities. Therefore, this paper proposes a multi-dimensional data analysis method based on the interactions between LLM agents and KGs, constructing a dynamic, collaborative analytical ecosystem. This method utilizes LLM agents to automatically extract product data from unstructured data, constructs and visualizes the KG in real-time, and supports users in deep exploration and analysis of graph nodes through an interactive platform. Experimental results show that this method has significant advantages in product ecosystem analysis, relationship mining, and user-driven exploratory analysis, providing new ideas and tools for multi-dimensional data analysis.

CVMar 10, 2025
Temporal Overlapping Prediction: A Self-supervised Pre-training Method for LiDAR Moving Object Segmentation

Ziliang Miao, Runjian Chen, Yixi Cai et al.

Moving object segmentation (MOS) on LiDAR point clouds is crucial for autonomous systems like self-driving vehicles. Previous supervised approaches rely heavily on costly manual annotations, while LiDAR sequences naturally capture temporal motion cues that can be leveraged for self-supervised learning. In this paper, we propose Temporal Overlapping Prediction (TOP), a self-supervised pre-training method that alleviate the labeling burden for MOS. TOP explores the temporal overlapping points that commonly observed by current and adjacent scans, and learns spatiotemporal representations by predicting the occupancy states of temporal overlapping points. Moreover, we utilize current occupancy reconstruction as an auxiliary pre-training objective, which enhances the current structural awareness of the model. We conduct extensive experiments and observe that the conventional metric Intersection-over-Union (IoU) shows strong bias to objects with more scanned points, which might neglect small or distant objects. To compensate for this bias, we introduce an additional metric called mIoU_obj to evaluate object-level performance. Experiments on nuScenes and SemanticKITTI show that TOPoutperforms both supervised training-from-scratch baseline and other self-supervised pre-training baselines by up to 28.77% relative improvement, demonstrating strong transferability across LiDAR setups and generalization to other tasks. Code and pre-trained models will be publicly available upon publication.

LGJan 3, 2025
Multivariate Time Series Anomaly Detection using DiffGAN Model

Guangqiang Wu, Fu Zhang

In recent years, some researchers have applied diffusion models to multivariate time series anomaly detection. The partial diffusion strategy, which depends on the diffusion steps, is commonly used for anomaly detection in these models. However, different diffusion steps have an impact on the reconstruction of the original data, thereby impacting the effectiveness of anomaly detection. To address this issue, we propose a novel method named DiffGAN, which adds a generative adversarial network component to the denoiser of diffusion model. This addition allows for the simultaneous generation of noisy data and prediction of diffusion steps. Compared to multiple state-of-the-art reconstruction models, experimental results demonstrate that DiffGAN achieves superior performance in anomaly detection.

ROFeb 24, 2022
Bubble Planner: Planning High-speed Smooth Quadrotor Trajectories using Receding Corridors

Yunfan Ren, Fangcheng Zhu, Wenyi Liu et al.

Quadrotors are agile platforms. With human experts, they can perform extremely high-speed flights in cluttered environments. However, fully autonomous flight at high speed remains a significant challenge. In this work, we propose a motion planning algorithm based on the corridor-constrained minimum control effort trajectory optimization (MINCO) framework. Specifically, we use a series of overlapping spheres to represent the free space of the environment and propose two novel designs that enable the algorithm to plan high-speed quadrotor trajectories in real-time. One is a sampling-based corridor generation method that generates spheres with large overlapped areas (hence overall corridor size) between two neighboring spheres. The second is a Receding Horizon Corridors (RHC) strategy, where part of the previously generated corridor is reused in each replan. Together, these two designs enlarge the corridor spaces in accordance with the quadrotor's current state and hence allow the quadrotor to maneuver at high speeds. We benchmark our algorithm against other state-of-the-art planning methods to show its superiority in simulation. Comprehensive ablation studies are also conducted to show the necessity of the two designs. The proposed method is finally evaluated on an autonomous LiDAR-navigated quadrotor UAV in woods environments, achieving flight speeds over 13.7 m/s without any prior map of the environment or external localization facility.

ROSep 14, 2021
Targetless Extrinsic Calibration of Multiple Small FoV LiDARs and Cameras using Adaptive Voxelization

Xiyuan Liu, Chongjian Yuan, Fu Zhang

Determining the extrinsic parameter between multiple LiDARs and cameras is essential for autonomous robots, especially for solid-state LiDARs, where each LiDAR unit has a very small Field-of-View (FoV), and multiple units are often used collectively. The majority of extrinsic calibration methods are proposed for 360$^\circ$ mechanical spinning LiDARs where the FoV overlap with other LiDAR or camera sensors is assumed. Few research works have been focused on the calibration of small FoV LiDARs and cameras nor on the improvement of the calibration speed. In this work, we consider the problem of extrinsic calibration among small FoV LiDARs and cameras, with the aim to shorten the total calibration time and further improve the calibration precision. We first implement an adaptive voxelization technique in the extraction and matching of LiDAR feature points. Such a process could avoid the redundant creation of $k$-d trees in LiDAR extrinsic calibration and extract LiDAR feature points in a more reliable and fast manner than existing methods. We then formulate the multiple LiDAR extrinsic calibration into a LiDAR Bundle Adjustment (BA) problem. By deriving the cost function up to second-order, the solving time and precision of the non-linear least square problem are further boosted. Our proposed method has been verified on data collected in four targetless scenes and under two types of solid-state LiDARs with a completely different scanning pattern, density, and FoV. The robustness of our work has also been validated under eight initial setups, with each setup containing 100 independent trials. Compared with the state-of-the-art methods, our work has increased the calibration speed 15 times for LiDAR-LiDAR extrinsic calibration and 1.5 times for LiDAR-Camera extrinsic calibration (averaged result from 50 independent trials) while remaining accurate.

ROJun 29, 2021
Model Predictive Control for Trajectory Tracking on Differentiable Manifolds

Guozheng Lu, Wei Xu, Fu Zhang

We consider the problem of bridging the gap between geometric tracking control theory and implementation of model predictive control (MPC) for robotic systems operating on manifolds. We propose a generic on-manifold MPC formulation based on a canonical representation of the system evolving on manifolds. Then, we present a method that solves the on-manifold MPC formulation by linearizing the system along the trajectory under tracking. There are two main advantages of the proposed scheme. The first is that the linearized system leads to an equivalent error system represented by a set of minimal parameters without any singularity. Secondly, the process of system modeling, error-system derivation, linearization and control has the manifold constraints completely decoupled from the system descriptions, enabling the development of a symbolic MPC framework that naturally encapsulates the manifold constraints. In this framework, users need only to supply system-specific descriptions without dealing with the manifold constraints. We implement this framework and test it on a quadrotor unmanned aerial vehicle (UAV) operating on $SO(3) \times \mathbb{R}^n$ and an unmanned ground vehicle (UGV) moving on a curved surface. Real-world experiments show that the proposed framework and implementation achieve high tracking performance and computational efficiency even in highly aggressive aerobatic quadrotor maneuvers.

ROApr 5, 2021
Control of a Tail-Sitter VTOL UAV Based on Recurrent Neural Networks

Jinni Zhou, Hao Xu, Zexiang Li et al.

Tail-sitter vertical takeoff and landing (VTOL) unmanned aerial vehicles (UAVs) have the capability of hovering and performing efficient level flight with compact mechanical structures. We present a unified controller design for such UAVs, based on recurrent neural networks. An advantage of this design method is that the various flight modes (i.e., hovering, transition and level flight) of a VTOL UAV are controlled in a unified manner, as opposed to treating them separately and in the runtime switching one from another. The proposed controller consists of an outer-loop position controller and an inner-loop attitude controller. The inner-loop controller is composed of a proportional attitude controller and a loop-shaping linear angular rate controller. For the outer-loop controller, we propose a nonlinear solver to compute the desired attitude and thrust, based on the UAV dynamics and an aerodynamic model, in addition to a cascaded PID controller for the position and velocity tracking. We employ a recurrent neural network (RNN) to approximate the behavior of the nonlinear solver, which suffers from high computational complexity. The proposed RNN has negligible approximation errors, and can be implemented in real-time (e.g., 50 Hz). Moreover, the RNN generates much smoother outputs than the nonlinear solver. We provide an analysis of the stability and robustness of the overall closed-loop system. Simulation and experiments are also presented to demonstrate the effectiveness of the proposed method.

ROFeb 22, 2021
ikd-Tree: An Incremental K-D Tree for Robotic Applications

Yixi Cai, Wei Xu, Fu Zhang

This paper proposes an efficient data structure, ikd-Tree, for dynamic space partition. The ikd-Tree incrementally updates a k-d tree with new coming points only, leading to much lower computation time than existing static k-d trees. Besides point-wise operations, the ikd-Tree supports several features such as box-wise operations and down-sampling that are practically useful in robotic applications. In parallel to the incremental operations (i.e., insert, re-insert, and delete), ikd-Tree actively monitors the tree structure and partially re-balances the tree, which enables efficient nearest point search in later stages. The ikd-Tree is carefully engineered and supports multi-thread parallel computing to maximize the overall efficiency. We validate the ikd-Tree in both theory and practical experiments. On theory level, a complete time complexity analysis is presented to prove the high efficiency. On experiment level, the ikd-Tree is tested on both randomized datasets and real-world LiDAR point data in LiDAR-inertial odometry and mapping application. In all tests, ikd-Tree consumes only 4% of the running time in a static k-d tree.

ROOct 12, 2020
Robots State Estimation and Observability Analysis Based on Statistical Motion Models

Wei Xu, Dongjiao He, Yixi Cai et al.

This paper presents a generic motion model to capture mobile robots' dynamic behaviors (translation and rotation). The model is based on statistical models driven by white random processes and is formulated into a full state estimation algorithm based on the error-state extended Kalman filtering framework (ESEKF). Major benefits of this method are its versatility, being applicable to different robotic systems without accurately modeling the robots' specific dynamics, and ability to estimate the robot's (angular) acceleration, jerk, or higher-order dynamic states with low delay. Mathematical analysis with numerical simulations are presented to show the properties of the statistical model-based estimation framework and to reveal its connection to existing low-pass filters. Furthermore, a new paradigm is developed for robots observability analysis by developing Lie derivatives and associated partial differentiation directly on manifolds. It is shown that this new paradigm is much simpler and more natural than existing methods based on quaternion parameterizations. It is also scalable to high dimensional systems. A novel \textbf{\textit{thin}} set concept is introduced to characterize the unobservable subset of the system states, providing the theoretical foundation to observability analysis of robotic systems operating on manifolds and in high dimension. Finally, extensive experiments including full state estimation and extrinsic calibration (both POS-IMU and IMU-IMU) on a quadrotor UAV, a handheld platform and a ground vehicle are conducted. Comparisons with existing methods show that the proposed method can effectively estimate all extrinsic parameters, the robot's translation/angular acceleration and other state variables (e.g., position, velocity, attitude) of high accuracy and low delay.

ROJul 3, 2020
A decentralized framework for simultaneous calibration, localization and mapping with multiple LiDARs

Jiarong Lin, Xiyuan Liu, Fu Zhang

LiDAR is playing a more and more essential role in autonomous driving vehicles for objection detection, self localization and mapping. A single LiDAR frequently suffers from hardware failure (e.g., temporary loss of connection) due to the harsh vehicle environment (e.g., temperature, vibration, etc.), or performance degradation due to the lack of sufficient geometry features, especially for solid-state LiDARs with small field of view (FoV). To improve the system robustness and performance in self-localization and mapping, we develop a decentralized framework for simultaneous calibration, localization and mapping with multiple LiDARs. Our proposed framework is based on an extended Kalman filter (EKF), but is specially formulated for decentralized implementation. Such an implementation could potentially distribute the intensive computation among smaller computing devices or resources dedicated for each LiDAR and remove the single point of failure problem. Then this decentralized formulation is implemented on an unmanned ground vehicle (UGV) carrying 5 low-cost LiDARs and moving at $1.3m/s$ in urban environments. Experiment results show that the proposed method can successfully and simultaneously estimate the vehicle state (i.e., pose and velocity) and all LiDAR extrinsic parameters. The localization accuracy is up to 0.2% on the two datasets we collected. To share our findings and to make contributions to the community, meanwhile enable the readers to verify our work, we will release all our source codes and hardware design blueprint on our Github.

ROJun 19, 2020
Low-cost Retina-like Robotic Lidars Based on Incommensurable Scanning

Zheng Liu, Fu Zhang, Xiaoping Hong

High performance lidars are essential in autonomous robots such as self-driving cars, automated ground vehicles and intelligent machines. Traditional mechanical scanning lidars offer superior performance in autonomous vehicles, but the potential mass application is limited by the inherent manufacturing difficulty. We propose a robotic lidar sensor based on incommensurable scanning that allows straightforward mass production and adoption in autonomous robots. Some unique features are additionally permitted by this incommensurable scanning. Similar to the fovea in human retina, this lidar features a peaked central angular density, enabling in applications that prefers eye-like attention. The incommensurable scanning method of this lidar could also provide a much higher resolution than conventional lidars which is beneficial in robotic applications such as sensor calibration. Examples making use of these advantageous features are demonstrated.

AIJun 10, 2020
Consolidating Commonsense Knowledge

Filip Ilievski, Pedro Szekely, Jingwei Cheng et al.

Commonsense reasoning is an important aspect of building robust AI systems and is receiving significant attention in the natural language understanding, computer vision, and knowledge graphs communities. At present, a number of valuable commonsense knowledge sources exist, with different foci, strengths, and weaknesses. In this paper, we list representative sources and their properties. Based on this survey, we propose principles and a representation model in order to consolidate them into a Common Sense Knowledge Graph (CSKG). We apply this approach to consolidate seven separate sources into a first integrated CSKG. We present statistics of CSKG, present initial investigations of its utility on four QA datasets, and list learned lessons.

ROMar 20, 2020
Hybrid aerial ground locomotion with a single passive wheel

Youming Qin, Yihang Li, Xu Wei et al.

Exploiting contacts with environment structures provides extra force support to a UAV, often reducing the power consumption and hence extending the mission time. This paper investigates one such way to exploit flat surfaces in the environment by a novel aerial-ground hybrid locomotion. Our design is a single passive wheel integrated at the UAV bottom, serving a minimal design to date. We present the principle and implementation of such a simple design as well as its control. Flight experiments are conducted to verify the feasibility and the power saving caused by the ground locomotion. Results show that our minimal design allows successful aerial-ground hybrid locomotion even with a less-controllable bi-copter UAV. The ground locomotion saves up to 77% battery without much tuning effort.

ROMar 21, 2019
Flying through a narrow gap using neural network: an end-to-end planning and control approach

Jiarong Lin, Luqi Wang, Fei Gao et al.

In this paper, we investigate the problem of enabling a drone to fly through a tilted narrow gap, without a traditional planning and control pipeline. To this end, we propose an end-to-end policy network, which imitates from the traditional pipeline and is fine-tuned using reinforcement learning. Unlike previous works which plan dynamical feasible trajectories using motion primitives and track the generated trajectory by a geometric controller, our proposed method is an end-to-end approach which takes the flight scenario as input and directly outputs thrust-attitude control commands for the quadrotor. Key contributions of our paper are: 1) presenting an imitate-reinforce training framework. 2) flying through a narrow gap using an end-to-end policy network, showing that learning based method can also address the highly dynamic control problem as the traditional pipeline does (see attached video: https://www.youtube.com/watch?v=jU1qRcLdjx0). 3) propose a robust imitation of an optimal trajectory generator using multilayer perceptrons. 4) show how reinforcement learning can improve the performance of imitation learning, and the potential to achieve higher performance over the model-based method.

CVSep 21, 2016
Detecting facial landmarks in the video based on a hybrid framework

Nian Cai, Zhineng Lin, Fu Zhang et al.

To dynamically detect the facial landmarks in the video, we propose a novel hybrid framework termed as detection-tracking-detection (DTD). First, the face bounding box is achieved from the first frame of the video sequence based on a traditional face detection method. Then, a landmark detector detects the facial landmarks, which is based on a cascaded deep convolution neural network (DCNN). Next, the face bounding box in the current frame is estimated and validated after the facial landmarks in the previous frame are tracked based on the median flow. Finally, the facial landmarks in the current frame are exactly detected from the validated face bounding box via the landmark detector. Experimental results indicate that the proposed framework can detect the facial landmarks in the video sequence more effectively and with lower consuming time compared to the frame-by-frame method via the DCNN.