CVMar 8, 2025

OpenRSD: Towards Open-prompts for Object Detection in Remote Sensing Images

arXiv:2503.06146v210 citationsh-index: 32
Originality Incremental advance
AI Analysis

This addresses the need for more generalizable and real-time object detection in remote sensing applications, though it appears incremental by building on existing open-vocabulary detection methods.

The paper tackles the problem of limited generalization in remote sensing object detection by proposing OpenRSD, an open-prompt framework that supports multimodal prompts and multi-task detection heads, achieving an 8.7% higher average precision than YOLO-World and 20.8 FPS inference speed on seven datasets.

Remote sensing object detection has made significant progress, but most studies still focus on closed-set detection, limiting generalization across diverse datasets. Open-vocabulary object detection (OVD) provides a solution by leveraging multimodal associations between text prompts and visual features. However, existing OVD methods for remote sensing (RS) images are constrained by small-scale datasets and fail to address the unique challenges of remote sensing interpretation, include oriented object detection and the need for both high precision and real-time performance in diverse scenarios. To tackle these challenges, we propose OpenRSD, a universal open-prompt RS object detection framework. OpenRSD supports multimodal prompts and integrates multi-task detection heads to balance accuracy and real-time requirements. Additionally, we design a multi-stage training pipeline to enhance the generalization of model. Evaluated on seven public datasets, OpenRSD demonstrates superior performance in oriented and horizontal bounding box detection, with real-time inference capabilities suitable for large-scale RS image analysis. Compared to YOLO-World, OpenRSD exhibits an 8.7\% higher average precision and achieves an inference speed of 20.8 FPS. Codes and models will be released.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes