CVFeb 9

Geospatial-Reasoning-Driven Vocabulary-Agnostic Remote Sensing Semantic Segmentation

Chufeng Zhou, Jian Wang, Xinyuan Liu, Xiaokang Zhang

arXiv:2602.08206v11.5h-index: 2

Originality Incremental advance

AI Analysis

This addresses the issue of misclassification in remote sensing for land-cover mapping, though it appears incremental as it builds on existing multimodal models.

The paper tackles the problem of semantic ambiguity in open-vocabulary remote sensing segmentation by introducing a Geospatial Reasoning Chain-of-Thought framework, which enhances scene understanding and achieves superior performance on benchmarks like LoveDA and GID5.

Open-vocabulary semantic segmentation has emerged as a promising research direction in remote sensing, enabling the recognition of diverse land-cover types beyond pre-defined category sets. However, existing methods predominantly rely on the passive mapping of visual features and textual embeddings. This ``appearance-based" paradigm lacks geospatial contextual awareness, leading to severe semantic ambiguity and misclassification when encountering land-cover classes with similar spectral features but distinct semantic attributes. To address this, we propose a Geospatial Reasoning Chain-of-Thought (GR-CoT) framework designed to enhance the scene understanding capabilities of Multimodal Large Language Models (MLLMs), thereby guiding open-vocabulary segmentation models toward precise mapping. The framework comprises two collaborative components: an offline knowledge distillation stream and an online instance reasoning stream. The offline stream establishes fine-grained category interpretation standards to resolve semantic conflicts between similar land-cover types. During online inference, the framework executes a sequential reasoning process involving macro-scenario anchoring, visual feature decoupling, and knowledge-driven decision synthesis. This process generates an image-adaptive vocabulary that guides downstream models to achieve pixel-level alignment with correct geographical semantics. Extensive experiments on the LoveDA and GID5 benchmarks demonstrate the superiority of our approach.

View on arXiv PDF

Similar