CVMay 2, 2024

CromSS: Cross-modal pre-training with noisy labels for remote sensing image segmentation

arXiv:2405.01217v39 citationsh-index: 14IEEE Trans Geosci Remote Sens
Originality Incremental advance
AI Analysis

This work addresses the challenge of leveraging low-cost noisy labels for geospatial segmentation, offering a domain-specific solution that is incremental in its method improvements.

The paper tackled the problem of enhancing feature learning for remote sensing image segmentation by pretraining models with large-scale noisy labels, proposing the CromSS method to mitigate noise and improve cross-modal consistency, and demonstrated significant improvements on downstream tasks using datasets like DFC2020.

We explore the potential of large-scale noisily labeled data to enhance feature learning by pretraining semantic segmentation models within a multi-modal framework for geospatial applications. We propose a novel Cross-modal Sample Selection (CromSS) method, a weakly supervised pretraining strategy designed to improve feature representations through cross-modal consistency and noise mitigation techniques. Unlike conventional pretraining approaches, CromSS exploits massive amounts of noisy and easy-to-come-by labels for improved feature learning beneficial to semantic segmentation tasks. We investigate middle and late fusion strategies to optimize the multi-modal pretraining architecture design. We also introduce a cross-modal sample selection module to mitigate the adverse effects of label noise, which employs a cross-modal entangling strategy to refine the estimated confidence masks within each modality to guide the sampling process. Additionally, we introduce a spatial-temporal label smoothing technique to counteract overconfidence for enhanced robustness against noisy labels. To validate our approach, we assembled the multi-modal dataset, NoLDO-S12, which consists of a large-scale noisy label subset from Google's Dynamic World (DW) dataset for pretraining and two downstream subsets with high-quality labels from Google DW and OpenStreetMap (OSM) for transfer learning. Experimental results on two downstream tasks and the publicly available DFC2020 dataset demonstrate that when effectively utilized, the low-cost noisy labels can significantly enhance feature learning for segmentation tasks. All data, code, and pretrained weights will be made publicly available.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes