CVMay 17, 2025

TinyRS-R1: Compact Multimodal Language Model for Remote Sensing

arXiv:2505.12099v210.24 citationsh-index: 2IEEE Geoscience and Remote Sensing Letters

Originality Incremental advance

AI Analysis

This enables efficient remote sensing applications on resource-constrained devices, though it is incremental as it builds upon existing models and methods.

The paper tackles the problem of running multimodal language models on edge hardware for remote sensing by introducing TinyRS-R1, a 2B-parameter model that matches or exceeds the performance of larger 7B-parameter models while reducing memory and latency by two-thirds.

Remote-sensing applications often run on edge hardware that cannot host today's 7B-parameter multimodal language models. This paper introduces TinyRS, the first 2B-parameter multimodal small language model (MSLM) optimized for remote sensing tasks, and TinyRS-R1, its reasoning-augmented variant. Built upon Qwen2-VL-2B, TinyRS is trained through a four-stage pipeline: pre-training on million satellite images, instruction tuning on visual instruction examples, fine-tuning with Chain-of-Thought (CoT) annotations from the proposed reasoning dataset, and alignment via Group Relative Policy Optimization (GRPO). TinyRS-R1 achieves or surpasses the performance of recent 7B-parameter remote sensing models across classification, VQA, visual grounding, and open-ended question answering-while requiring just one-third of the memory and latency. Our analysis shows that CoT reasoning substantially benefits spatial grounding and scene understanding, while the non-reasoning TinyRS excels in concise, latency-sensitive VQA tasks. TinyRS-R1 represents the first domain-specialized MSLM with GRPO-aligned CoT reasoning for general-purpose remote sensing.

View on arXiv PDF

Similar