Handling Image and Label Resolution Mismatch in Remote Sensing
This addresses a domain-specific challenge in remote sensing for researchers and practitioners, but is incremental as it builds on existing segmentation techniques.
The paper tackles the problem of resolution mismatch between overhead imagery and ground-truth labels in remote sensing semantic segmentation, introducing a method that uses low-resolution labels and an exemplar set of high-resolution labels to generate fine-grained predictions without requiring high-resolution annotations.
Though semantic segmentation has been heavily explored in vision literature, unique challenges remain in the remote sensing domain. One such challenge is how to handle resolution mismatch between overhead imagery and ground-truth label sources, due to differences in ground sample distance. To illustrate this problem, we introduce a new dataset and use it to showcase weaknesses inherent in existing strategies that naively upsample the target label to match the image resolution. Instead, we present a method that is supervised using low-resolution labels (without upsampling), but takes advantage of an exemplar set of high-resolution labels to guide the learning process. Our method incorporates region aggregation, adversarial learning, and self-supervised pretraining to generate fine-grained predictions, without requiring high-resolution annotations. Extensive experiments demonstrate the real-world applicability of our approach.