RoofNet: A Global Multimodal Dataset for Roof Material Classification
This dataset fills a critical gap in global exposure data for disaster risk modeling, enabling more accurate vulnerability assessments for earthquakes, floods, wildfires, and hurricanes.
RoofNet introduces the largest and most geographically diverse multimodal dataset for roof material classification, comprising over 51,500 samples from 184 sites, to address the lack of global roof material data for natural hazard vulnerability modeling. The dataset pairs satellite imagery with text annotations and uses vision-language modeling with prompt tuning to classify 14 roofing types.
Natural disasters are increasing in frequency and severity, causing hundreds of billions of dollars in damage annually and posing growing threats to infrastructure and human livelihoods. Accurate data on roofing materials is critical for modeling building vulnerability to natural hazards such as earthquakes, floods, wildfires, and hurricanes, yet such data remain unavailable. To address this gap, we introduce RoofNet, the largest and most geographically diverse novel multimodal dataset to date, comprising over 51,500 samples from 184 geographically diverse sites pairing high-resolution Earth Observation (EO) imagery with curated text annotations for global roof material classification. RoofNet includes geographically diverse satellite imagery labeled with 14 key roofing types and is designed to enhance the fidelity of global exposure datasets through vision-language modeling (VLM). We sample EO tiles from climatically and architecturally distinct regions to construct a representative dataset. A subset of 6,000 images was annotated in collaboration with domain experts to fine-tune a VLM. We used geographic- and material-aware prompt tuning to enhance class separability. The fine-tuned model was then applied to the remaining EO tiles, with predictions refined through rule-based and human-in-the-loop verification. In addition to material labels, RoofNet provides rich metadata including roof shape, footprint area, solar panel presence, and indicators of mixed roofing materials (e.g., HVAC systems). The dataset used in earlier experiments has been removed due to licensing constraints related to imagery sources. Results based on this dataset should be interpreted with caution. Updated experiments using compliant data are in progress.