CVMar 3, 2024

SA-MixNet: Structure-aware Mixup and Invariance Learning for Scribble-supervised Road Extraction in Remote Sensing Images

arXiv:2403.01381v16 citationsh-index: 32IEEE Trans Geosci Remote Sens
Originality Incremental advance
AI Analysis

This work addresses a domain-specific problem for remote sensing applications by enhancing road extraction accuracy in a weakly supervised setting, offering a plug-and-play solution with incremental improvements over existing methods.

The paper tackles the problem of performance degradation in weakly supervised road extraction from remote sensing images due to poor model invariance across varying scenes, and proposes SA-MixNet, which improves invariance through structure-aware mixup and regularization, achieving state-of-the-art IoU gains of 1.47%, 2.12%, and 4.09% on three datasets.

Mainstreamed weakly supervised road extractors rely on highly confident pseudo-labels propagated from scribbles, and their performance often degrades gradually as the image scenes tend various. We argue that such degradation is due to the poor model's invariance to scenes with different complexities, whereas existing solutions to this problem are commonly based on crafted priors that cannot be derived from scribbles. To eliminate the reliance on such priors, we propose a novel Structure-aware Mixup and Invariance Learning framework (SA-MixNet) for weakly supervised road extraction that improves the model invariance in a data-driven manner. Specifically, we design a structure-aware Mixup scheme to paste road regions from one image onto another for creating an image scene with increased complexity while preserving the road's structural integrity. Then an invariance regularization is imposed on the predictions of constructed and origin images to minimize their conflicts, which thus forces the model to behave consistently on various scenes. Moreover, a discriminator-based regularization is designed for enhancing the connectivity meanwhile preserving the structure of roads. Combining these designs, our framework demonstrates superior performance on the DeepGlobe, Wuhan, and Massachusetts datasets outperforming the state-of-the-art techniques by 1.47%, 2.12%, 4.09% respectively in IoU metrics, and showing its potential of plug-and-play. The code will be made publicly available.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes