CVMay 9, 2024

Exploring Text-Guided Single Image Editing for Remote Sensing Images

arXiv:2405.05769v43 citationsHas CodeIEEE J Sel Top Appl Earth Obs Remote Sens
Originality Synthesis-oriented
AI Analysis

This addresses a domain-specific need for remote sensing image editing, offering a practical solution for tasks like disaster assessment, though it is incremental in adapting existing techniques to this niche.

The paper tackles the problem of text-guided editing for remote sensing images, which lacks attention compared to generation, by proposing a method trainable on a single image that improves CLIP scores and subjective evaluations over existing methods.

Artificial intelligence generative content (AIGC) has significantly impacted image generation in the field of remote sensing. However, the equally important area of remote sensing image (RSI) editing has not received sufficient attention. Deep learning based editing methods generally involve two sequential stages: generation and editing. For natural images, these stages primarily rely on generative backbones pre-trained on large-scale benchmark datasets and text guidance facilitated by vision-language models (VLMs). However, it become less viable for RSIs: First, existing generative RSI benchmark datasets do not fully capture the diversity of RSIs, and is often inadequate for universal editing tasks. Second, the single text semantic corresponds to multiple image semantics, leading to the introduction of incorrect semantics. To solve above problems, this paper proposes a text-guided RSI editing method and can be trained using only a single image. A multi-scale training approach is adopted to preserve consistency without the need for training on extensive benchmarks, while leveraging RSI pre-trained VLMs and prompt ensembling (PE) to ensure accuracy and controllability. Experimental results on multiple RSI editing tasks show that the proposed method offers significant advantages in both CLIP scores and subjective evaluations compared to existing methods. Additionally, we explore the ability of the edited RSIs to support disaster assessment tasks in order to validate their practicality. Codes will be released at https://github.com/HIT-PhilipHan/remote_sensing_image_editing.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes