CVAIMay 29, 2025

PAN-Crafter: Learning Modality-Consistent Alignment for PAN-Sharpening

arXiv:2505.23367v25 citationsh-index: 5
Originality Incremental advance
AI Analysis

This addresses a fundamental challenge in remote sensing for applications like environmental monitoring, though it is an incremental improvement over existing deep learning methods.

The paper tackles the problem of cross-modality misalignment in PAN-sharpening, which causes spectral distortion and blurring, by proposing PAN-Crafter, a framework that uses modality-consistent alignment and achieves state-of-the-art performance with 50.11× faster inference time and 0.63× the memory size compared to previous methods.

PAN-sharpening aims to fuse high-resolution panchromatic (PAN) images with low-resolution multi-spectral (MS) images to generate high-resolution multi-spectral (HRMS) outputs. However, cross-modality misalignment -- caused by sensor placement, acquisition timing, and resolution disparity -- induces a fundamental challenge. Conventional deep learning methods assume perfect pixel-wise alignment and rely on per-pixel reconstruction losses, leading to spectral distortion, double edges, and blurring when misalignment is present. To address this, we propose PAN-Crafter, a modality-consistent alignment framework that explicitly mitigates the misalignment gap between PAN and MS modalities. At its core, Modality-Adaptive Reconstruction (MARs) enables a single network to jointly reconstruct HRMS and PAN images, leveraging PAN's high-frequency details as auxiliary self-supervision. Additionally, we introduce Cross-Modality Alignment-Aware Attention (CM3A), a novel mechanism that bidirectionally aligns MS texture to PAN structure and vice versa, enabling adaptive feature refinement across modalities. Extensive experiments on multiple benchmark datasets demonstrate that our PAN-Crafter outperforms the most recent state-of-the-art method in all metrics, even with 50.11$\times$ faster inference time and 0.63$\times$ the memory size. Furthermore, it demonstrates strong generalization performance on unseen satellite datasets, showing its robustness across different conditions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes