Cross-Scale Pretraining: Enhancing Self-Supervised Learning for Low-Resolution Satellite Imagery for Semantic Segmentation
This work addresses a domain-specific problem in remote sensing for semantic segmentation, offering an incremental improvement.
The paper tackled the problem of enhancing self-supervised learning for low-resolution satellite imagery by incorporating high-resolution datasets, resulting in improved downstream segmentation performance that outperformed models trained on either dataset alone.
Self-supervised pretraining in remote sensing is mostly done using mid-spatial resolution (MR) image datasets due to their high availability. Given the release of high-resolution (HR) datasets, we ask how HR datasets can be included in self-supervised pretraining to enhance MR image representation learning and downstream segmentation performance on MR tasks. We design a spatial affinity component that can be added to existing self-supervised learning frameworks and that uses HR imagery to learn better representations of MR imagery. We test the spatial affinity component on two self-supervised learning frameworks and show that it outperforms models pretrained on HR or MR images alone.