ALOOD: Exploiting Language Representations for LiDAR-based Out-of-Distribution Object Detection
This work tackles a safety-critical problem for autonomous driving systems by improving the detection of unknown objects, which is an incremental step in OOD detection.
The paper addresses the problem of LiDAR-based 3D object detectors making overconfident predictions for out-of-distribution (OOD) objects, which poses safety risks in autonomous driving. They propose ALOOD, a method that integrates language representations from a vision-language model to treat OOD detection as a zero-shot classification task, achieving competitive performance on the nuScenes OOD benchmark.
LiDAR-based 3D object detection plays a critical role for reliable and safe autonomous driving systems. However, existing detectors often produce overly confident predictions for objects not belonging to known categories, posing significant safety risks. This is caused by so-called out-of-distribution (OOD) objects, which were not part of the training data, resulting in incorrect predictions. To address this challenge, we propose ALOOD (Aligned LiDAR representations for Out-Of-Distribution Detection), a novel approach that incorporates language representations from a vision-language model (VLM). By aligning the object features from the object detector to the feature space of the VLM, we can treat the detection of OOD objects as a zero-shot classification task. We demonstrate competitive performance on the nuScenes OOD benchmark, establishing a novel approach to OOD object detection in LiDAR using language representations. The source code is available at https://github.com/uulm-mrm/mmood3d.