CVApr 17

Towards Realistic Open-Vocabulary Remote Sensing Segmentation: Benchmark and Baseline

Bingyu Li, Tao Huo, Haocheng Dong, Da Zhang, Zhiyuan Zhao, Junyu Gao, Xuelong Li

arXiv:2604.1565273.8h-index: 13Has Code

AI Analysis

For researchers in remote sensing segmentation, this work provides a more realistic benchmark and a competitive baseline, though the method is incremental.

The authors introduce OVRSISBenchV2, a large-scale benchmark for open-vocabulary remote sensing segmentation with 170K images and 128 categories, and propose Pi-Seg, a baseline using positive-incentive noise to improve transferability. Pi-Seg achieves strong results on the new benchmark, demonstrating the importance of realistic evaluation.

Open-vocabulary remote sensing image segmentation (OVRSIS) remains underexplored due to fragmented datasets, limited training diversity, and the lack of evaluation benchmarks that reflect realistic geospatial application demands. Our previous \textit{OVRSISBenchV1} established an initial cross-dataset evaluation protocol, but its limited scope is insufficient for assessing realistic open-world generalization. To address this issue, we propose \textit{OVRSISBenchV2}, a large-scale and application-oriented benchmark for OVRSIS. We first construct \textbf{OVRSIS95K}, a balanced dataset of about 95K image--mask pairs covering 35 common semantic categories across diverse remote sensing scenes. Built upon OVRSIS95K and 10 downstream datasets, OVRSISBenchV2 contains 170K images and 128 categories, substantially expanding scene diversity, semantic coverage, and evaluation difficulty. Beyond standard open-vocabulary segmentation, it further includes downstream protocols for building extraction, road extraction, and flood detection, thereby better reflecting realistic geospatial application demands and complex deployment scenarios. We also propose \textbf{Pi-Seg}, a baseline for OVRSIS. Pi-Seg improves transferability through a \textbf{positive-incentive noise} mechanism, where learnable and semantically guided perturbations broaden the visual-text feature space during training. Extensive experiments on OVRSISBenchV1, OVRSISBenchV2, and downstream tasks show that Pi-Seg delivers strong and consistent results, particularly on the more challenging OVRSISBenchV2 benchmark. Our results highlight both the importance of realistic benchmark design and the effectiveness of perturbation-based transfer for OVRSIS. The code and datasets are available at \href{https://github.com/LiBingyu01/RSKT-Seg/tree/Pi-Seg}{LiBingyu01/RSKT-Seg/tree/Pi-Seg}.

View on arXiv PDF Code

Similar