Yunzhou Zhang

h-index30

9papers

302citations

Novelty53%

AI Score39

Ranked #76,474 of 194,257 authors (top 39%)#2,276 in RO (top 34%)

9 Papers

5.0CVMar 3, 2023Code

BSH-Det3D: Improving 3D Object Detection with BEV Shape Heatmap

You Shen, Yunzhou Zhang, Yanmin Wu et al. · pku

The progress of LiDAR-based 3D object detection has significantly enhanced developments in autonomous driving and robotics. However, due to the limitations of LiDAR sensors, object shapes suffer from deterioration in occluded and distant areas, which creates a fundamental challenge to 3D perception. Existing methods estimate specific 3D shapes and achieve remarkable performance. However, these methods rely on extensive computation and memory, causing imbalances between accuracy and real-time performance. To tackle this challenge, we propose a novel LiDAR-based 3D object detection model named BSH-Det3D, which applies an effective way to enhance spatial features by estimating complete shapes from a bird's eye view (BEV). Specifically, we design the Pillar-based Shape Completion (PSC) module to predict the probability of occupancy whether a pillar contains object shapes. The PSC module generates a BEV shape heatmap for each scene. After integrating with heatmaps, BSH-Det3D can provide additional information in shape deterioration areas and generate high-quality 3D proposals. We also design an attention-based densification fusion module (ADF) to adaptively associate the sparse features with heatmaps and raw points. The ADF module integrates the advantages of points and shapes knowledge with negligible overheads. Extensive experiments on the KITTI benchmark achieve state-of-the-art (SOTA) performance in terms of accuracy and speed, demonstrating the efficiency and flexibility of BSH-Det3D. The source code is available on https://github.com/mystorm16/BSH-Det3D.

20.9ROApr 27, 2020Code

EAO-SLAM: Monocular Semi-Dense Object SLAM Based on Ensemble Data Association

Yanmin Wu, Yunzhou Zhang, Delong Zhu et al.

Object-level data association and pose estimation play a fundamental role in semantic SLAM, which remain unsolved due to the lack of robust and accurate algorithms. In this work, we propose an ensemble data associate strategy for integrating the parametric and nonparametric statistic tests. By exploiting the nature of different statistics, our method can effectively aggregate the information of different measurements, and thus significantly improve the robustness and accuracy of data association. We then present an accurate object pose estimation framework, in which an outliers-robust centroid and scale estimation algorithm and an object pose initialization algorithm are developed to help improve the optimality of pose estimation results. Furthermore, we build a SLAM system that can generate semi-dense or lightweight object-oriented maps with a monocular camera. Extensive experiments are conducted on three publicly available datasets and a real scenario. The results show that our approach significantly outperforms state-of-the-art techniques in accuracy and robustness. The source code is available on: https://github.com/yanmin-wu/EAO-SLAM.

8.4CVSep 6, 2025

JRN-Geo: A Joint Perception Network based on RGB and Normal images for Cross-view Geo-localization

Hongyu Zhou, Yunzhou Zhang, Tingsong Huang et al.

Cross-view geo-localization plays a critical role in Unmanned Aerial Vehicle (UAV) localization and navigation. However, significant challenges arise from the drastic viewpoint differences and appearance variations between images. Existing methods predominantly rely on semantic features from RGB images, often neglecting the importance of spatial structural information in capturing viewpoint-invariant features. To address this issue, we incorporate geometric structural information from normal images and introduce a Joint perception network to integrate RGB and Normal images (JRN-Geo). Our approach utilizes a dual-branch feature extraction framework, leveraging a Difference-Aware Fusion Module (DAFM) and Joint-Constrained Interaction Aggregation (JCIA) strategy to enable deep fusion and joint-constrained semantic and structural information representation. Furthermore, we propose a 3D geographic augmentation technique to generate potential viewpoint variation samples, enhancing the network's ability to learn viewpoint-invariant features. Extensive experiments on the University-1652 and SUES-200 datasets validate the robustness of our method against complex viewpoint ariations, achieving state-of-the-art performance.

2.0CVApr 28, 2024

S3-SLAM: Sparse Tri-plane Encoding for Neural Implicit SLAM

Zhiyao Zhang, Yunzhou Zhang, Yanmin Wu et al. · pku

With the emergence of Neural Radiance Fields (NeRF), neural implicit representations have gained widespread applications across various domains, including simultaneous localization and mapping. However, current neural implicit SLAM faces a challenging trade-off problem between performance and the number of parameters. To address this problem, we propose sparse tri-plane encoding, which efficiently achieves scene reconstruction at resolutions up to 512 using only 2~4% of the commonly used tri-plane parameters (reduced from 100MB to 2~4MB). On this basis, we design S3-SLAM to achieve rapid and high-quality tracking and mapping through sparsifying plane parameters and integrating orthogonal features of tri-plane. Furthermore, we develop hierarchical bundle adjustment to achieve globally consistent geometric structures and reconstruct high-resolution appearance. Experimental results demonstrate that our approach achieves competitive tracking and scene reconstruction with minimal parameters on three datasets. Source code will soon be available.

17.0ROMay 12, 2023

An Object SLAM Framework for Association, Mapping, and High-Level Tasks

Yanmin Wu, Yunzhou Zhang, Delong Zhu et al.

Object SLAM is considered increasingly significant for robot high-level perception and decision-making. Existing studies fall short in terms of data association, object representation, and semantic mapping and frequently rely on additional assumptions, limiting their performance. In this paper, we present a comprehensive object SLAM framework that focuses on object-based perception and object-oriented robot tasks. First, we propose an ensemble data association approach for associating objects in complicated conditions by incorporating parametric and nonparametric statistic testing. In addition, we suggest an outlier-robust centroid and scale estimation algorithm for modeling objects based on the iForest and line alignment. Then a lightweight and object-oriented map is represented by estimated general object models. Taking into consideration the semantic invariance of objects, we convert the object map to a topological map to provide semantic descriptors to enable multi-map matching. Finally, we suggest an object-driven active exploration strategy to achieve autonomous mapping in the grasping scenario. A range of public datasets and real-world results in mapping, augmented reality, scene matching, relocalization, and robotic manipulation have been used to evaluate the proposed object SLAM framework for its efficient performance.

9.2ROFeb 4, 2022

CFP-SLAM: A Real-time Visual SLAM Based on Coarse-to-Fine Probability in Dynamic Environments

Xinggang Hu, Yunzhou Zhang, Zhenzhong Cao et al.

The dynamic factors in the environment will lead to the decline of camera localization accuracy due to the violation of the static environment assumption of SLAM algorithm. Recently, some related works generally use the combination of semantic constraints and geometric constraints to deal with dynamic objects, but problems can still be raised, such as poor real-time performance, easy to treat people as rigid bodies, and poor performance in low dynamic scenes. In this paper, a dynamic scene-oriented visual SLAM algorithm based on object detection and coarse-to-fine static probability named CFP-SLAM is proposed. The algorithm combines semantic constraints and geometric constraints to calculate the static probability of objects, keypoints and map points, and takes them as weights to participate in camera pose estimation. Extensive evaluations show that our approach can achieve almost the best results in high dynamic and low dynamic scenarios compared to the state-of-the-art dynamic SLAM methods, and shows quite high real-time ability.

5.3ROOct 18, 2021

Accurate and Robust Object-oriented SLAM with 3D Quadric Landmark Construction in Outdoor Environment

Rui Tian, Yunzhou Zhang, Yonghui Feng et al.

Object-oriented SLAM is a popular technology in autonomous driving and robotics. In this paper, we propose a stereo visual SLAM with a robust quadric landmark representation method. The system consists of four components, including deep learning detection, object-oriented data association, dual quadric landmark initialization and object-based pose optimization. State-of-the-art quadric-based SLAM algorithms always face observation related problems and are sensitive to observation noise, which limits their application in outdoor scenes. To solve this problem, we propose a quadric initialization method based on the decoupling of the quadric parameters method, which improves the robustness to observation noise. The sufficient object data association algorithm and object-oriented optimization with multiple cues enables a highly accurate object pose estimation that is robust to local observations. Experimental results show that the proposed system is more robust to observation noise and significantly outperforms current state-of-the-art methods in outdoor environments. In addition, the proposed system demonstrates real-time performance.

6.5CVJan 15, 2021

Accurate and Robust Scale Recovery for Monocular Visual Odometry Based on Plane Geometry

Rui Tian, Yunzhou Zhang, Delong Zhu et al.

Scale ambiguity is a fundamental problem in monocular visual odometry. Typical solutions include loop closure detection and environment information mining. For applications like self-driving cars, loop closure is not always available, hence mining prior knowledge from the environment becomes a more promising approach. In this paper, with the assumption of a constant height of the camera above the ground, we develop a light-weight scale recovery framework leveraging an accurate and robust estimation of the ground plane. The framework includes a ground point extraction algorithm for selecting high-quality points on the ground plane, and a ground point aggregation algorithm for joining the extracted ground points in a local sliding window. Based on the aggregated data, the scale is finally recovered by solving a least-squares problem using a RANSAC-based optimizer. Sufficient data and robust optimizer enable a highly accurate scale recovery. Experiments on the KITTI dataset show that the proposed framework can achieve state-of-the-art accuracy in terms of translation errors, while maintaining competitive performance on the rotation error. Due to the light-weight design, our framework also demonstrates a high frequency of 20Hz on the dataset.

5.7RODec 3, 2020

Object SLAM-Based Active Mapping and Robotic Grasping

Yanmin Wu, Yunzhou Zhang, Delong Zhu et al.

This paper presents the first active object mapping framework for complex robotic manipulation and autonomous perception tasks. The framework is built on an object SLAM system integrated with a simultaneous multi-object pose estimation process that is optimized for robotic grasping. Aiming to reduce the observation uncertainty on target objects and increase their pose estimation accuracy, we also design an object-driven exploration strategy to guide the object mapping process, enabling autonomous mapping and high-level perception. Combining the mapping module and the exploration strategy, an accurate object map that is compatible with robotic grasping can be generated. Additionally, quantitative evaluations also indicate that the proposed framework has a very high mapping accuracy. Experiments with manipulation (including object grasping and placement) and augmented reality significantly demonstrate the effectiveness and advantages of our proposed framework.