CV AI CLMar 29, 2025

Efficient Adaptation For Remote Sensing Visual Grounding

Hasan Moughnieh, Mohamad Chalhoub, Hasan Nasrallah, Cristiano Nattero, Paolo Campanella, Giovanni Nico, Ali J. Ghandour

arXiv:2503.23083v33.6h-index: 16IGARSS

Originality Incremental advance

AI Analysis

This work addresses the underexplored problem of visual grounding in remote sensing for researchers and practitioners, offering a practical and cost-efficient alternative to full model training, though it is incremental as it applies existing PEFT methods to a new domain.

The paper tackled adapting pre-trained vision-language models for remote sensing visual grounding tasks using Parameter Efficient Fine Tuning techniques, achieving performance comparable to or surpassing state-of-the-art models while significantly reducing computational costs.

Adapting pre-trained models has become an effective strategy in artificial intelligence, offering a scalable and efficient alternative to training models from scratch. In the context of remote sensing (RS), where visual grounding(VG) remains underexplored, this approach enables the deployment of powerful vision-language models to achieve robust cross-modal understanding while significantly reducing computational overhead. To address this, we applied Parameter Efficient Fine Tuning (PEFT) techniques to adapt these models for RS-specific VG tasks. Specifically, we evaluated LoRA placement across different modules in Grounding DINO and used BitFit and adapters to fine-tune the OFA foundation model pre-trained on general-purpose VG datasets. This approach achieved performance comparable to or surpassing current State Of The Art (SOTA) models while significantly reducing computational costs. This study highlights the potential of PEFT techniques to advance efficient and precise multi-modal analysis in RS, offering a practical and cost-effective alternative to full model training.

View on arXiv PDF

Similar