CVMay 23

Image-Conditioned Instance Prompt Network for Referring Remote Sensing Image Segmentation

arXiv:2605.2453211.9
Predicted impact top 66% in CV · last 90 daysOriginality Incremental advance
AI Analysis

This work addresses the bottleneck of cross-modal feature fusion in referring remote sensing image segmentation, a task relevant to embodied perception.

The paper proposes ICIPNet for referring remote sensing image segmentation, which introduces an Image-Conditioned Instance Prompt module and Bilateral Information Fusion to improve cross-modal feature fusion. The model outperforms existing methods on RRSIS benchmarks.

Referring Remote Sensing Image Segmentation (RRSIS) is a situated, task-driven cross-modal task related to the embodied perception paradigm, requiring models to align visual-spatial features with linguistic intentions for precise target perception. Recent research has focused on refining the granularity of textual features and optimizing image-text feature fusion to better guide target feature representations. However, insufficient descriptive granularity and sensitivity to semantic shifts can cause bottlenecks in cross-modal feature fusion. To address these issues, we propose the Image-Conditioned Instance Prompt Network (ICIPNet) with Bilateral Information Fusion, which is designed to alleviate bottlenecks in cross-modal feature fusion. ICIPNet introduces an Image-Conditioned Instance Prompt (ICIP) module to generate self-adaptive visual and semantic representations without external knowledge. The Bilateral Information Fusion (BIF) module enhances feature fusion along the token and channel dimensions. Experiments demonstrate that the proposed ICIPNet outperforms existing RRSIS models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes