First Place Solution to the ECCV 2024 BRAVO Challenge: Evaluating Robustness of Vision Foundation Models for Semantic Segmentation
This work addresses robustness evaluation for vision models in semantic segmentation, but it is incremental as it applies an existing method to a new challenge.
The paper tackled the problem of evaluating robustness in vision foundation models for semantic segmentation by fine-tuning DINOv2 with a segmentation decoder on Cityscapes, achieving first place in the ECCV 2024 BRAVO Challenge.
In this report, we present the first place solution to the ECCV 2024 BRAVO Challenge, where a model is trained on Cityscapes and its robustness is evaluated on several out-of-distribution datasets. Our solution leverages the powerful representations learned by vision foundation models, by attaching a simple segmentation decoder to DINOv2 and fine-tuning the entire model. This approach outperforms more complex existing approaches, and achieves first place in the challenge. Our code is publicly available at https://github.com/tue-mps/benchmark-vfm-ss.