CVApr 22, 2023Code
Knowledge Distillation from 3D to Bird's-Eye-View for LiDAR Semantic SegmentationFeng Jiang, Heng Gao, Shoumeng Qiu et al.
LiDAR point cloud segmentation is one of the most fundamental tasks for autonomous driving scene understanding. However, it is difficult for existing models to achieve both high inference speed and accuracy simultaneously. For example, voxel-based methods perform well in accuracy, while Bird's-Eye-View (BEV)-based methods can achieve real-time inference. To overcome this issue, we develop an effective 3D-to-BEV knowledge distillation method that transfers rich knowledge from 3D voxel-based models to BEV-based models. Our framework mainly consists of two modules: the voxel-to-pillar distillation module and the label-weight distillation module. Voxel-to-pillar distillation distills sparse 3D features to BEV features for middle layers to make the BEV-based model aware of more structural and geometric information. Label-weight distillation helps the model pay more attention to regions with more height information. Finally, we conduct experiments on the SemanticKITTI dataset and Paris-Lille-3D. The results on SemanticKITTI show more than 5% improvement on the test set, especially for classes such as motorcycle and person, with more than 15% improvement. The code can be accessed at https://github.com/fengjiang5/Knowledge-Distillation-from-Cylinder3D-to-PolarNet.
CVMar 22, 2025Code
Multi-modality Anomaly Segmentation on the RoadHeng Gao, Zhuolin He, Shoumeng Qiu et al.
Semantic segmentation allows autonomous driving cars to understand the surroundings of the vehicle comprehensively. However, it is also crucial for the model to detect obstacles that may jeopardize the safety of autonomous driving systems. Based on our experiments, we find that current uni-modal anomaly segmentation frameworks tend to produce high anomaly scores for non-anomalous regions in images. Motivated by this empirical finding, we develop a multi-modal uncertainty-based anomaly segmentation framework, named MMRAS+, for autonomous driving systems. MMRAS+ effectively reduces the high anomaly outputs of non-anomalous classes by introducing text-modal using the CLIP text encoder. Indeed, MMRAS+ is the first multi-modal anomaly segmentation solution for autonomous driving. Moreover, we develop an ensemble module to further boost the anomaly segmentation performance. Experiments on RoadAnomaly, SMIYC, and Fishyscapes validation datasets demonstrate the superior performance of our method. The code is available in https://github.com/HengGao12/MMRAS_plus.
MLJun 24, 2024Code
Enhancing OOD Detection Using Latent DiffusionHeng Gao, Jun Li
Out-of-distribution (OOD) detection is crucial for the reliable deployment of machine learning models in real-world scenarios, enabling the identification of unknown samples or objects. A prominent approach to enhance OOD detection performance involves leveraging auxiliary datasets for training. Recent efforts have explored using generative models, such as Stable Diffusion (SD), to synthesize outlier data in the pixel space. However, synthesizing OOD data in the pixel space can lead to reduced robustness due to over-generation. To address this challenge, we propose Outlier-Aware Learning (OAL), a novel framework that generates synthetic OOD training data within the latent space, taking a further step to study how to utilize Stable Diffusion for developing a latent-based outlier synthesis approach. This improvement facilitates network training with fewer outliers and less computational cost. Besides, to regularize the model's decision boundary, we develop a mutual information-based contrastive learning module (MICL) that amplifies the distinction between In-Distribution (ID) and collected OOD data. Moreover, we develop a knowledge distillation module to prevent the degradation of ID classification accuracy when training with OOD data. The superior performance of our method on several benchmark datasets demonstrates its efficiency and effectiveness. Source code is available in https://github.com/HengGao12/OAL.
LGFeb 22, 2025Code
Detecting OOD Samples via Optimal Transport Scoring FunctionHeng Gao, Zhuolin He, Jian Pu
To deploy machine learning models in the real world, researchers have proposed many OOD detection algorithms to help models identify unknown samples during the inference phase and prevent them from making untrustworthy predictions. Unlike methods that rely on extra data for outlier exposure training, post hoc methods detect Out-of-Distribution (OOD) samples by developing scoring functions, which are model agnostic and do not require additional training. However, previous post hoc methods may fail to capture the geometric cues embedded in network representations. Thus, in this study, we propose a novel score function based on the optimal transport theory, named OTOD, for OOD detection. We utilize information from features, logits, and the softmax probability space to calculate the OOD score for each test sample. Our experiments show that combining this information can boost the performance of OTOD with a certain margin. Experiments on the CIFAR-10 and CIFAR-100 benchmarks demonstrate the superior performance of our method. Notably, OTOD outperforms the state-of-the-art method GEN by 7.19% in the mean FPR@95 on the CIFAR-10 benchmark using ResNet-18 as the backbone, and by 12.51% in the mean FPR@95 using WideResNet-28 as the backbone. In addition, we provide theoretical guarantees for OTOD. The code is available in https://github.com/HengGao12/OTOD.
68.2CRMar 16
BadLLM-TG: A Backdoor Defender powered by LLM Trigger GeneratorRuyi Zhang, Heng Gao, Songlei Jian et al.
Backdoor attacks compromise model reliability by using triggers to manipulate outputs. Trigger inversion can accurately locate these triggers via a generator and is therefore critical for backdoor defense. However, the discrete nature of text prevents existing noise-based trigger generator from being applied to nature language processing (NLP). To overcome the limitations, we employ the rich knowledge embedded in large language models (LLMs) and propose a Backdoor defender powered by LLM Trigger Generator, termed BadLLM-TG. It is optimized through prompt-driven reinforcement learning, using the victim model's feedback loss as the reward signal. The generated triggers are then employed to mitigate the backdoor via adversarial training. Experiments show that our method reduces the attack success rate by 76.2\% on average, outperforming the second-best defender by 13.7.
IRMar 9, 2013
The Powerful Model Adpredictor for Search Engine Switching Detection ChallengeHeng Gao, Yongbao Li, Qiudan Li et al.
The purpose of the Switching Detection Challenge in the 2013 WSCD workshop was to predict users' search engine swithcing actions given records about search sessions and logs.Our solution adopted the powerful prediction model Adpredictor and utilized the method of feature engineering. We successfully applied the click through rate (CTR) prediction model Adpredicitor into our solution framework, and then the discovery of effective features and the multiple classification of different switching type make our model outperforms many competitors. We achieved an AUC score of 0.84255 on the private leaderboard and ranked the 5th among all the competitors in the competition.