Language-Guided Open-World Anomaly Segmentation
This addresses the challenge of open-world anomaly segmentation for autonomous driving systems, enabling detection and labeling of unknown objects, which is an incremental improvement over existing methods.
The paper tackles the problem of detecting and segmenting unknown objects in autonomous driving scenes by proposing Clipomaly, a CLIP-based method that segments anomalies and assigns human-interpretable names without needing anomaly-specific training data. It achieves state-of-the-art performance on anomaly segmentation benchmarks.
Open-world and anomaly segmentation methods seek to enable autonomous driving systems to detect and segment both known and unknown objects in real-world scenes. However, existing methods do not assign semantically meaningful labels to unknown regions, and distinguishing and learning representations for unknown classes remains difficult. While open-vocabulary segmentation methods show promise in generalizing to novel classes, they require a fixed inference vocabulary and thus cannot be directly applied to anomaly segmentation where unknown classes are unconstrained. We propose Clipomaly, the first CLIP-based open-world and anomaly segmentation method for autonomous driving. Our zero-shot approach requires no anomaly-specific training data and leverages CLIP's shared image-text embedding space to both segment unknown objects and assign human-interpretable names to them. Unlike open-vocabulary methods, our model dynamically extends its vocabulary at inference time without retraining, enabling robust detection and naming of anomalies beyond common class definitions such as those in Cityscapes. Clipomaly achieves state-of-the-art performance on established anomaly segmentation benchmarks while providing interpretability and flexibility essential for practical deployment.