CVApr 16, 2022

Efficient Linear Attention for Fast and Accurate Keypoint Matching

arXiv:2204.07731v311 citationsh-index: 8
Originality Incremental advance
AI Analysis

This work addresses the computational bottleneck in 3D vision applications by improving efficiency and accuracy for sparse matching, though it is incremental as it builds on existing Transformer-based methods.

The paper tackles the inefficiency of Transformers in sparse keypoint matching by employing efficient linear attention and a new attentional aggregation method, achieving competitive performance with only 0.84M parameters compared to larger models like SuperGlue (12M) and SGMNet (30M) on benchmarks such as HPatch, ETH, and Aachen Day-Night.

Recently Transformers have provided state-of-the-art performance in sparse matching, crucial to realize high-performance 3D vision applications. Yet, these Transformers lack efficiency due to the quadratic computational complexity of their attention mechanism. To solve this problem, we employ an efficient linear attention for the linear computational complexity. Then, we propose a new attentional aggregation that achieves high accuracy by aggregating both the global and local information from sparse keypoints. To further improve the efficiency, we propose the joint learning of feature matching and description. Our learning enables simpler and faster matching than Sinkhorn, often used in matching the learned descriptors from Transformers. Our method achieves competitive performance with only 0.84M learnable parameters against the bigger SOTAs, SuperGlue (12M parameters) and SGMNet (30M parameters), on three benchmarks, HPatch, ETH, and Aachen Day-Night.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes