Fangzhou Zhao

h-index2

3papers

17citations

Novelty42%

AI Score30

Ranked #138,762 of 194,257 authors (top 71%)#45,673 in CV (top 77%)

3 Papers

6.5CVSep 9, 2024

Proto-OOD: Enhancing OOD Object Detection with Prototype Feature Similarity

Junkun Chen, Jilin Mei, Liang Chen et al.

Neural networks that are trained on limited category samples often mispredict out-of-distribution (OOD) objects. We observe that features of the same category are more tightly clustered in feature space, while those of different categories are more dispersed. Based on this, we propose using prototype similarity for OOD detection. Drawing on widely used prototype features in few-shot learning, we introduce a novel OOD detection network structure (Proto-OOD). Proto-OOD enhances the representativeness of category prototypes using contrastive loss and detects OOD data by evaluating the similarity between input features and category prototypes. During training, Proto-OOD generates OOD samples for training the similarity module with a negative embedding generator. When Pascal VOC are used as the in-distribution dataset and MS-COCO as the OOD dataset, Proto-OOD significantly reduces the FPR (false positive rate). Moreover, considering the limitations of existing evaluation metrics, we propose a more reasonable evaluation protocol. The code will be released.

12.8CVOct 21, 2024

WildOcc: A Benchmark for Off-Road 3D Semantic Occupancy Prediction

Heng Zhai, Jilin Mei, Chen Min et al.

3D semantic occupancy prediction is an essential part of autonomous driving, focusing on capturing the geometric details of scenes. Off-road environments are rich in geometric information, therefore it is suitable for 3D semantic occupancy prediction tasks to reconstruct such scenes. However, most of researches concentrate on on-road environments, and few methods are designed for off-road 3D semantic occupancy prediction due to the lack of relevant datasets and benchmarks. In response to this gap, we introduce WildOcc, to our knowledge, the first benchmark to provide dense occupancy annotations for off-road 3D semantic occupancy prediction tasks. A ground truth generation pipeline is proposed in this paper, which employs a coarse-to-fine reconstruction to achieve a more realistic result. Moreover, we introduce a multi-modal 3D semantic occupancy prediction framework, which fuses spatio-temporal information from multi-frame images and point clouds at voxel level. In addition, a cross-modality distillation function is introduced, which transfers geometric knowledge from point clouds to image features.

10.2CVAug 12, 2025

ROD: RGB-Only Fast and Efficient Off-road Freespace Detection

Tong Sun, Hongliang Ye, Jilin Mei et al.

Off-road freespace detection is more challenging than on-road scenarios because of the blurred boundaries of traversable areas. Previous state-of-the-art (SOTA) methods employ multi-modal fusion of RGB images and LiDAR data. However, due to the significant increase in inference time when calculating surface normal maps from LiDAR data, multi-modal methods are not suitable for real-time applications, particularly in real-world scenarios where higher FPS is required compared to slow navigation. This paper presents a novel RGB-only approach for off-road freespace detection, named ROD, eliminating the reliance on LiDAR data and its computational demands. Specifically, we utilize a pre-trained Vision Transformer (ViT) to extract rich features from RGB images. Additionally, we design a lightweight yet efficient decoder, which together improve both precision and inference speed. ROD establishes a new SOTA on ORFD and RELLIS-3D datasets, as well as an inference speed of 50 FPS, significantly outperforming prior models.