CVMar 28, 2022

S2-Net: Self-supervision Guided Feature Representation Learning for Cross-Modality Images

arXiv:2203.14581v15 citationsh-index: 1
Originality Incremental advance
AI Analysis

This work addresses cross-modality image matching for computer vision applications, presenting an incremental improvement through a novel training strategy.

The paper tackles the problem of cross-modality image matching by proposing S2-Net, a network that combines supervised and self-supervised learning to improve feature representation learning, achieving state-of-the-art performance on RoadScene and RGB-NIR datasets.

Combining the respective advantages of cross-modality images can compensate for the lack of information in the single modality, which has attracted increasing attention of researchers into multi-modal image matching tasks. Meanwhile, due to the great appearance differences between cross-modality image pairs, it often fails to make the feature representations of correspondences as close as possible. In this letter, we design a cross-modality feature representation learning network, S2-Net, which is based on the recently successful detect-and-describe pipeline, originally proposed for visible images but adapted to work with cross-modality image pairs. To solve the consequent problem of optimization difficulties, we introduce self-supervised learning with a well-designed loss function to guide the training without discarding the original advantages. This novel strategy simulates image pairs in the same modality, which is also a useful guide for the training of cross-modality images. Notably, it does not require additional data but significantly improves the performance and is even workable for all methods of the detect-and-describe pipeline. Extensive experiments are conducted to evaluate the performance of the strategy we proposed, compared to both handcrafted and deep learning-based methods. Results show that our elegant formulation of combined optimization of supervised and self-supervised learning outperforms state-of-the-arts on RoadScene and RGB-NIR datasets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes