CVMay 11, 2024

Semantic Guided Large Scale Factor Remote Sensing Image Super-resolution with Generative Diffusion Prior

arXiv:2405.07044v140 citationsh-index: 8Has CodeIsprs Journal of Photogrammetry and Remote Sensing
Originality Incremental advance
AI Analysis

This work addresses the challenge of enhancing low-resolution satellite data for remote sensing applications, representing an incremental improvement with novel integration of semantic guidance and imaging characteristics.

The paper tackles the problem of large scale factor super-resolution for remote sensing images, which often lack clear textures and correct ground objects, by introducing the Semantic Guided Diffusion Model (SGDM) that uses a generative diffusion prior and incorporates vector maps and sensor-specific imaging characteristics, resulting in superior performance validated on a new dataset and downstream tasks.

Remote sensing images captured by different platforms exhibit significant disparities in spatial resolution. Large scale factor super-resolution (SR) algorithms are vital for maximizing the utilization of low-resolution (LR) satellite data captured from orbit. However, existing methods confront challenges in recovering SR images with clear textures and correct ground objects. We introduce a novel framework, the Semantic Guided Diffusion Model (SGDM), designed for large scale factor remote sensing image super-resolution. The framework exploits a pre-trained generative model as a prior to generate perceptually plausible SR images. We further enhance the reconstruction by incorporating vector maps, which carry structural and semantic cues. Moreover, pixel-level inconsistencies in paired remote sensing images, stemming from sensor-specific imaging characteristics, may hinder the convergence of the model and diversity in generated results. To address this problem, we propose to extract the sensor-specific imaging characteristics and model the distribution of them, allowing diverse SR images generation based on imaging characteristics provided by reference images or sampled from the imaging characteristic probability distributions. To validate and evaluate our approach, we create the Cross-Modal Super-Resolution Dataset (CMSRD). Qualitative and quantitative experiments on CMSRD showcase the superiority and broad applicability of our method. Experimental results on downstream vision tasks also demonstrate the utilitarian of the generated SR images. The dataset and code will be publicly available at https://github.com/wwangcece/SGDM

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes