Zhiliu Yang

CV
h-index1
5papers
11citations
Novelty55%
AI Score41

5 Papers

CVSep 19, 2024
GaRField++: Reinforced Gaussian Radiance Fields for Large-Scale 3D Scene Reconstruction

Hanyue Zhang, Zhiliu Yang, Xinhe Zuo et al.

This paper proposes a novel framework for large-scale scene reconstruction based on 3D Gaussian splatting (3DGS) and aims to address the scalability and accuracy challenges faced by existing methods. For tackling the scalability issue, we split the large scene into multiple cells, and the candidate point-cloud and camera views of each cell are correlated through a visibility-based camera selection and a progressive point-cloud extension. To reinforce the rendering quality, three highlighted improvements are made in comparison with vanilla 3DGS, which are a strategy of the ray-Gaussian intersection and the novel Gaussians density control for learning efficiency, an appearance decoupling module based on ConvKAN network to solve uneven lighting conditions in large-scale scenes, and a refined final loss with the color loss, the depth distortion loss, and the normal consistency loss. Finally, the seamless stitching procedure is executed to merge the individual Gaussian radiance field for novel view synthesis across different cells. Evaluation of Mill19, Urban3D, and MatrixCity datasets shows that our method consistently generates more high-fidelity rendering results than state-of-the-art methods of large-scale scene reconstruction. We further validate the generalizability of the proposed approach by rendering on self-collected video clips recorded by a commercial drone.

CVOct 29, 2023
TivNe-SLAM: Dynamic Mapping and Tracking via Time-Varying Neural Radiance Fields

Chengyao Duan, Zhiliu Yang

Previous attempts to integrate Neural Radiance Fields (NeRF) into the Simultaneous Localization and Mapping (SLAM) framework either rely on the assumption of static scenes or require the ground truth camera poses, which impedes their application in real-world scenarios. This paper proposes a time-varying representation to track and reconstruct the dynamic scenes. Firstly, two processes, a tracking process and a mapping process, are maintained simultaneously in our framework. In the tracking process, all input images are uniformly sampled and then progressively trained in a self-supervised paradigm. In the mapping process, we leverage motion masks to distinguish dynamic objects from the static background, and sample more pixels from dynamic areas. Secondly, the parameter optimization for both processes is comprised of two stages: the first stage associates time with 3D positions to convert the deformation field to the canonical field. The second stage associates time with the embeddings of the canonical field to obtain colors and a Signed Distance Function (SDF). Lastly, we propose a novel keyframe selection strategy based on the overlapping rate. Our approach is evaluated on two synthetic datasets and one real-world dataset, and the experiments validate that our method achieves competitive results in both tracking and mapping when compared to existing state-of-the-art NeRF-based dynamic SLAM systems.

CVNov 4, 2023
LISNeRF Mapping: LiDAR-based Implicit Mapping via Semantic Neural Fields for Large-Scale 3D Scenes

Jianyuan Zhang, Zhiliu Yang, Meng Zhang

Large-scale semantic mapping is crucial for outdoor autonomous agents to fulfill high-level tasks such as planning and navigation. This paper proposes a novel method for large-scale 3D semantic reconstruction through implicit representations from posed LiDAR measurements alone. We first leverage an octree-based and hierarchical structure to store implicit features, then these implicit features are decoded to semantic information and signed distance value through shallow Multilayer Perceptrons (MLPs). We adopt off-the-shelf algorithms to predict the semantic labels and instance IDs of point clouds. We then jointly optimize the feature embeddings and MLPs parameters with a self-supervision paradigm for point cloud geometry and a pseudo-supervision paradigm for semantic and panoptic labels. Subsequently, categories and geometric structures for novel points are regressed, and marching cubes are exploited to subdivide and visualize the scenes in the inferring stage. For scenarios with memory constraints, a map stitching strategy is also developed to merge sub-maps into a complete map. Experiments on two real-world datasets, SemanticKITTI and SemanticPOSS, demonstrate the superior segmentation efficiency and mapping effectiveness of our framework compared to current state-of-the-art 3D LiDAR mapping methods.

49.9ROApr 2
Hi-LOAM: Hierarchical Implicit Neural Fields for LiDAR Odometry and Mapping

Zhiliu Yang, Jianyuan Zhang, Lianhui Zhao et al.

LiDAR Odometry and Mapping (LOAM) is a pivotal technique for embodied-AI applications such as autonomous driving and robot navigation. Most existing LOAM frameworks are either contingent on the supervision signal, or lack of the reconstruction fidelity, which are deficient in depicting details of large-scale complex scenes. To overcome these limitations, we propose a multi-scale implicit neural localization and mapping framework using LiDAR sensor, called Hi-LOAM. Hi-LOAM receives LiDAR point cloud as the input data modality, learns and stores hierarchical latent features in multiple levels of hash tables based on an octree structure, then these multi-scale latent features are decoded into signed distance value through shallow Multilayer Perceptrons (MLPs) in the mapping procedure. For pose estimation procedure, we rely on a correspondence-free, scan-to-implicit matching paradigm to estimate optimal pose and register current scan into the submap. The entire training process is conducted in a self-supervised manner, which waives the model pre-training and manifests its generalizability when applied to diverse environments. Extensive experiments on multiple real-world and synthetic datasets demonstrate the superior performance, in terms of the effectiveness and generalization capabilities, of our Hi-LOAM compared to existing state-of-the-art methods.

CVOct 14, 2025
SPORTS: Simultaneous Panoptic Odometry, Rendering, Tracking and Segmentation for Urban Scenes Understanding

Zhiliu Yang, Jinyu Dai, Jianyuan Zhang et al.

The scene perception, understanding, and simulation are fundamental techniques for embodied-AI agents, while existing solutions are still prone to segmentation deficiency, dynamic objects' interference, sensor data sparsity, and view-limitation problems. This paper proposes a novel framework, named SPORTS, for holistic scene understanding via tightly integrating Video Panoptic Segmentation (VPS), Visual Odometry (VO), and Scene Rendering (SR) tasks into an iterative and unified perspective. Firstly, VPS designs an adaptive attention-based geometric fusion mechanism to align cross-frame features via enrolling the pose, depth, and optical flow modality, which automatically adjust feature maps for different decoding stages. And a post-matching strategy is integrated to improve identities tracking. In VO, panoptic segmentation results from VPS are combined with the optical flow map to improve the confidence estimation of dynamic objects, which enhances the accuracy of the camera pose estimation and completeness of the depth map generation via the learning-based paradigm. Furthermore, the point-based rendering of SR is beneficial from VO, transforming sparse point clouds into neural fields to synthesize high-fidelity RGB views and twin panoptic views. Extensive experiments on three public datasets demonstrate that our attention-based feature fusion outperforms most existing state-of-the-art methods on the odometry, tracking, segmentation, and novel view synthesis tasks.