CVMar 1

Open-Vocabulary vs Supervised Learning Methods for Post-Disaster Visual Scene Understanding

arXiv:2603.01324v1h-index: 10
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of automated damage assessment from aerial imagery for disaster response, but it is incremental as it provides a comparative evaluation rather than introducing new methods.

The paper compared supervised learning and open-vocabulary vision models for post-disaster visual scene understanding, finding that supervised training remains the most reliable approach, especially for small objects and fine boundary delineation in cluttered scenes.

Aerial imagery is critical for large-scale post-disaster damage assessment. Automated interpretation remains challenging due to clutter, visual variability, and strong cross-event domain shift, while supervised approaches still rely on costly, task-specific annotations with limited coverage across disaster types and regions. Recent open-vocabulary and foundation vision models offer an appealing alternative, by reducing dependence on fixed label sets and extensive task-specific annotations. Instead, they leverage large-scale pretraining and vision-language representations. These properties are particularly relevant for post-disaster domains, where visual concepts are ambiguous and data availability is constrained. In this work, we present a comparative evaluation of supervised learning and open-vocabulary vision models for post-disaster scene understanding, focusing on semantic segmentation and object detection across multiple datasets, including FloodNet+, RescueNet, DFire, and LADD. We examine performance trends, failure modes, and practical trade-offs between different learning paradigms, providing insight into their applicability for real-world disaster response. The most notable remark across all evaluated benchmarks is that supervised training remains the most reliable approach (i.e., when the label space is fixed and annotations are available), especially for small objects and fine boundary delineation in cluttered scenes.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes