Multimodal Foundational Models for Unsupervised 3D General Obstacle Detection
This addresses the issue of detecting varied and edge-case obstacles in autonomous driving perception, though it appears incremental as it builds on existing foundational models and computational geometry methods.
The paper tackles the problem of detecting general obstacles in 3D for autonomous driving, which current supervised models struggle with due to fixed categories, by proposing a multimodal foundational model combined with unsupervised computational geometry, achieving detection without expensive retraining.
Current autonomous driving perception models primarily rely on supervised learning with predefined categories. However, these models struggle to detect general obstacles not included in the fixed category set due to their variability and numerous edge cases. To address this issue, we propose a combination of multimodal foundational model-based obstacle segmentation with traditional unsupervised computational geometry-based outlier detection. Our approach operates offline, allowing us to leverage non-causality, and utilizes training-free methods. This enables the detection of general obstacles in 3D without the need for expensive retraining. To overcome the limitations of publicly available obstacle detection datasets, we collected and annotated our dataset, which includes various obstacles even in distant regions.