CVFeb 1, 2024

InfMAE: A Foundation Model in the Infrared Modality

arXiv:2402.00407v224 citationsh-index: 10ECCV
AI Analysis

This work addresses the lack of large-scale data and specialized models for the infrared vision community, representing an incremental advancement in domain-specific foundation models.

The paper tackles the problem of designing a foundation model for infrared vision by proposing InfMAE, which includes a new dataset, an information-aware masking strategy, a multi-scale encoder, and an infrared decoder, and it outperforms existing methods in three downstream tasks.

In recent years, the foundation models have swept the computer vision field and facilitated the development of various tasks within different modalities. However, it remains an open question on how to design an infrared foundation model. In this paper, we propose InfMAE, a foundation model in infrared modality. We release an infrared dataset, called Inf30 to address the problem of lacking large-scale data for self-supervised learning in the infrared vision community. Besides, we design an information-aware masking strategy, which is suitable for infrared images. This masking strategy allows for a greater emphasis on the regions with richer information in infrared images during the self-supervised learning process, which is conducive to learning the generalized representation. In addition, we adopt a multi-scale encoder to enhance the performance of the pre-trained encoders in downstream tasks. Finally, based on the fact that infrared images do not have a lot of details and texture information, we design an infrared decoder module, which further improves the performance of downstream tasks. Extensive experiments show that our proposed method InfMAE outperforms other supervised methods and self-supervised learning methods in three downstream tasks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes