CVJul 11, 2024

SCPNet: Unsupervised Cross-modal Homography Estimation via Intra-modal Self-supervised Learning

Runmin Zhang, Jun Ma, Si-Yuan Cao, Lun Luo, Beinan Yu, Shu-Jie Chen, Junwei Li, Hui-Liang Shen

arXiv:2407.08148v111.310 citationsh-index: 16Has Code

Originality Highly original

AI Analysis

This addresses the problem of aligning images from different modalities (e.g., satellite and map images) without labeled data, offering a novel unsupervised method with significant performance gains.

The paper tackles unsupervised cross-modal homography estimation by proposing SCPNet, which uses intra-modal self-supervised learning, correlation, and consistent feature map projection, achieving a 14.0% lower mean average corner error than the supervised approach MHN on the GoogleMap dataset.

We propose a novel unsupervised cross-modal homography estimation framework based on intra-modal Self-supervised learning, Correlation, and consistent feature map Projection, namely SCPNet. The concept of intra-modal self-supervised learning is first presented to facilitate the unsupervised cross-modal homography estimation. The correlation-based homography estimation network and the consistent feature map projection are combined to form the learnable architecture of SCPNet, boosting the unsupervised learning framework. SCPNet is the first to achieve effective unsupervised homography estimation on the satellite-map image pair cross-modal dataset, GoogleMap, under [-32,+32] offset on a 128x128 image, leading the supervised approach MHN by 14.0% of mean average corner error (MACE). We further conduct extensive experiments on several cross-modal/spectral and manually-made inconsistent datasets, on which SCPNet achieves the state-of-the-art (SOTA) performance among unsupervised approaches, and owns 49.0%, 25.2%, 36.4%, and 10.7% lower MACEs than the supervised approach MHN. Source code is available at https://github.com/RM-Zhang/SCPNet.

View on arXiv PDF Code

Similar