CVOct 6, 2023

VI-Diff: Unpaired Visible-Infrared Translation Diffusion Model for Single Modality Labeled Visible-Infrared Person Re-identification

arXiv:2310.04122v15.014 citationsh-index: 12Has Code

Originality Incremental advance

AI Analysis

This addresses the high annotation cost problem for cross-modality person re-identification in surveillance applications, but it is incremental as it builds on existing translation techniques.

The paper tackles the challenge of Visible-Infrared person re-identification by using single-modality labeled data and proposes VI-Diff, a diffusion model for unpaired image translation, which outperforms existing models in experiments.

Visible-Infrared person re-identification (VI-ReID) in real-world scenarios poses a significant challenge due to the high cost of cross-modality data annotation. Different sensing cameras, such as RGB/IR cameras for good/poor lighting conditions, make it costly and error-prone to identify the same person across modalities. To overcome this, we explore the use of single-modality labeled data for the VI-ReID task, which is more cost-effective and practical. By labeling pedestrians in only one modality (e.g., visible images) and retrieving in another modality (e.g., infrared images), we aim to create a training set containing both originally labeled and modality-translated data using unpaired image-to-image translation techniques. In this paper, we propose VI-Diff, a diffusion model that effectively addresses the task of Visible-Infrared person image translation. Through comprehensive experiments, we demonstrate that VI-Diff outperforms existing diffusion and GAN models, making it a promising solution for VI-ReID with single-modality labeled data. Our approach can be a promising solution to the VI-ReID task with single-modality labeled data and serves as a good starting point for future study. Code will be available.

View on arXiv PDF Code

Similar