CVMay 12, 2025

Now you see it, Now you don't: Damage Label Agreement in Drone & Satellite Post-Disaster Imagery

Microsoft
arXiv:2505.08117v14 citationsh-index: 7FAccT
Originality Synthesis-oriented
AI Analysis

It addresses label reliability for disaster damage assessment, which is crucial for deploying ethical and effective machine learning systems in humanitarian contexts, though it is incremental as it builds on prior work with more data and standardized methods.

This paper audits damage labels from satellite and drone imagery for 15,814 buildings across three hurricanes, finding 29.02% label disagreement and that satellite-derived labels under-report damage by at least 20.43% compared to drone-derived labels, indicating risks for machine learning systems.

This paper audits damage labels derived from coincident satellite and drone aerial imagery for 15,814 buildings across Hurricanes Ian, Michael, and Harvey, finding 29.02% label disagreement and significantly different distributions between the two sources, which presents risks and potential harms during the deployment of machine learning damage assessment systems. Currently, there is no known study of label agreement between drone and satellite imagery for building damage assessment. The only prior work that could be used to infer if such imagery-derived labels agree is limited by differing damage label schemas, misaligned building locations, and low data quantities. This work overcomes these limitations by comparing damage labels using the same damage label schemas and building locations from three hurricanes, with the 15,814 buildings representing 19.05 times more buildings considered than the most relevant prior work. The analysis finds satellite-derived labels significantly under-report damage by at least 20.43% compared to drone-derived labels (p<1.2x10^-117), and satellite- and drone-derived labels represent significantly different distributions (p<5.1x10^-175). This indicates that computer vision and machine learning (CV/ML) models trained on at least one of these distributions will misrepresent actual conditions, as the differing satellite and drone-derived distributions cannot simultaneously represent the distribution of actual conditions in a scene. This potential misrepresentation poses ethical risks and potential societal harm if not managed. To reduce the risk of future societal harms, this paper offers four recommendations to improve reliability and transparency to decisio-makers when deploying CV/ML damage assessment systems in practice

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes