CVJul 16, 2024

Distractors-Immune Representation Learning with Cross-modal Contrastive Regularization for Change Captioning

arXiv:2407.11683v121 citationsh-index: 21Has Code
Originality Incremental advance
AI Analysis

This work addresses the challenge of generating accurate captions for image changes in computer vision, though it appears incremental as it builds on existing methods to handle distractors.

The paper tackles the problem of change captioning under distractors like illumination and viewpoint changes by proposing a distractors-immune representation learning network with cross-modal contrastive regularization, achieving state-of-the-art performance on four public datasets.

Change captioning aims to succinctly describe the semantic change between a pair of similar images, while being immune to distractors (illumination and viewpoint changes). Under these distractors, unchanged objects often appear pseudo changes about location and scale, and certain objects might overlap others, resulting in perturbational and discrimination-degraded features between two images. However, most existing methods directly capture the difference between them, which risk obtaining error-prone difference features. In this paper, we propose a distractors-immune representation learning network that correlates the corresponding channels of two image representations and decorrelates different ones in a self-supervised manner, thus attaining a pair of stable image representations under distractors. Then, the model can better interact them to capture the reliable difference features for caption generation. To yield words based on the most related difference features, we further design a cross-modal contrastive regularization, which regularizes the cross-modal alignment by maximizing the contrastive alignment between the attended difference features and generated words. Extensive experiments show that our method outperforms the state-of-the-art methods on four public datasets. The code is available at https://github.com/tuyunbin/DIRL.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes