CVMay 22

SLIP-RS: Structured-Attribute Language-Image Pre-Training for Remote Sensing Object Detection

arXiv:2605.2314486.7Has Code
Predicted impact top 20% in CV · last 90 daysOriginality Highly original
AI Analysis

This work addresses the data scarcity bottleneck in remote sensing object detection by enabling fine-grained discriminability without exhaustive open-set enumeration, offering a practical solution for domain-specific vision tasks.

SLIP-RS introduces a structured-attribute decoupling paradigm for remote sensing object detection, replacing monolithic label learning with finite attribute spaces. It achieves state-of-the-art fine-grained detection and cross-domain generalization, validated on the RS-Attribute-15M dataset with over 15 million annotations.

Existing language-image pre-training for remote sensing object detection is constrained by Monolithic Label Learning, which relies on exhaustively enumerating open-set categories via black-box data to acquire fine-grained representations, creating a dependency incompatible with the domain's inherent data scarcity. To transcend this bottleneck, we propose SLIP-RS, establishing a Structured-Attribute Decoupling Paradigm that maps the open-ended category space into a finite, physically meaningful attribute space, unlocking fine-grained discriminability via explicit structural logic. This paradigm is realized via two technical pillars: (1) Structured-Attribute Contrastive Learning, which enforces the learning of decoupled intrinsic visual logic via combinatorial attribute augmentation; and (2) Conformal Attribute Reliability Engine, which leverages conformal prediction theory to rigorously distill high-fidelity supervision from noisy sources, yielding RS-Attribute-15M, the largest dataset with over 15 million attribute annotations. Extensive experiments demonstrate that SLIP-RS establishes unprecedented performance in fine-grained detection and cross-domain generalization, validating structured attributes as a vital foundation for remote sensing. Code: https://github.com/facias914/SLIP-RS.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes