CVDec 6, 2022

AbHE: All Attention-based Homography Estimation

arXiv:2212.03029v318 citationsh-index: 22
AI Analysis

This work addresses a fundamental computer vision problem for image processing applications, presenting an incremental improvement by integrating transformer-based methods into an existing framework.

The paper tackles homography estimation for image alignment by proposing an all-attention-based model using Swin Transformer and cross non-local layers, achieving state-of-the-art performance in 8 Degree-of-Freedom estimation.

Homography estimation is a basic computer vision task, which aims to obtain the transformation from multi-view images for image alignment. Unsupervised learning homography estimation trains a convolution neural network for feature extraction and transformation matrix regression. While the state-of-theart homography method is based on convolution neural networks, few work focuses on transformer which shows superiority in highlevel vision tasks. In this paper, we propose a strong-baseline model based on the Swin Transformer, which combines convolution neural network for local features and transformer module for global features. Moreover, a cross non-local layer is introduced to search the matched features within the feature maps coarsely. In the homography regression stage, we adopt an attention layer for the channels of correlation volume, which can drop out some weak correlation feature points. The experiment shows that in 8 Degree-of-Freedoms(DOFs) homography estimation our method overperforms the state-of-the-art method.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes