CVMar 29, 2023Code
PMAA: A Progressive Multi-scale Attention Autoencoder Model for High-performance Cloud Removal from Multi-temporal Satellite ImageryXuechao Zou, Kai Li, Junliang Xing et al.
Satellite imagery analysis plays a pivotal role in remote sensing; however, information loss due to cloud cover significantly impedes its application. Although existing deep cloud removal models have achieved notable outcomes, they scarcely consider contextual information. This study introduces a high-performance cloud removal architecture, termed Progressive Multi-scale Attention Autoencoder (PMAA), which concurrently harnesses global and local information to construct robust contextual dependencies using a novel Multi-scale Attention Module (MAM) and a novel Local Interaction Module (LIM). PMAA establishes long-range dependencies of multi-scale features using MAM and modulates the reconstruction of fine-grained details utilizing LIM, enabling simultaneous representation of fine- and coarse-grained features at the same level. With the help of diverse and multi-scale features, PMAA consistently outperforms the previous state-of-the-art model CTGAN on two benchmark datasets. Moreover, PMAA boasts considerable efficiency advantages, with only 0.5% and 14.6% of the parameters and computational complexity of CTGAN, respectively. These comprehensive results underscore PMAA's potential as a lightweight cloud removal network suitable for deployment on edge devices to accomplish large-scale cloud removal tasks. Our source code and pre-trained models are available at https://github.com/XavierJiezou/PMAA.
LGJul 21, 2025
Multimodal Fine-grained Reasoning for Post Quality EvaluationXiaoxu Guo, Siyan Liang, Yachao Cui et al.
Accurately assessing post quality requires complex relational reasoning to capture nuanced topic-post relationships. However, existing studies face three major limitations: (1) treating the task as unimodal categorization, which fails to leverage multimodal cues and fine-grained quality distinctions; (2) introducing noise during deep multimodal fusion, leading to misleading signals; and (3) lacking the ability to capture complex semantic relationships like relevance and comprehensiveness. To address these issues, we propose the Multimodal Fine-grained Topic-post Relational Reasoning (MFTRR) framework, which mimics human cognitive processes. MFTRR reframes post-quality assessment as a ranking task and incorporates multimodal data to better capture quality variations. It consists of two key modules: (1) the Local-Global Semantic Correlation Reasoning Module, which models fine-grained semantic interactions between posts and topics at both local and global levels, enhanced by a maximum information fusion mechanism to suppress noise; and (2) the Multi-Level Evidential Relational Reasoning Module, which explores macro- and micro-level relational cues to strengthen evidence-based reasoning. We evaluate MFTRR on three newly constructed multimodal topic-post datasets and the public Lazada-Home dataset. Experimental results demonstrate that MFTRR significantly outperforms state-of-the-art baselines, achieving up to 9.52% NDCG@3 improvement over the best unimodal method on the Art History dataset.