Geometric Flood Depth Estimation: Fusing Transformer-Based Segmentation with Digital Elevation Models
For disaster response teams, it provides a rapid way to obtain 3D flood information from 2D imagery, though the approach is incremental as it applies existing segmentation and DEM fusion techniques.
This work introduces a method to estimate flood depth from monocular aerial images by combining transformer-based segmentation (Mask2Former) with Digital Elevation Models, achieving per-pixel depth without hydrodynamic simulations.
Post-disaster situational awareness relies heavily on understanding both the extent and the volume of floodwaters. While 2D semantic segmentation provides accurate flood masking, it lacks the vertical dimension required to assess navigability and structural risk. This paper presents a geometric "Water Surface Elevation" approach for estimating flood depth from monocular aerial imagery. Our pipeline utilizes Mask2Former, a state-of-the-art transformer-based segmentation model, to generate precise 2D flood masks. These masks are fused with Digital Elevation Models (DEMs) to identify the water-land boundary, calculate a global water surface elevation ($Z_{water}$), and compute per-pixel depth based on the principle of local hydrostatic equilibrium. We evaluate this workflow using the FloodNet and CRASAR-U-DROIDS datasets, demonstrating how high-performance segmentation can be leveraged to extract 3D volumetric data from 2D imagery without the latency of hydrodynamic simulations.