CVCLMMMar 6, 2023

Neighborhood Contrastive Transformer for Change Captioning

arXiv:2303.03171v133 citationsh-index: 28Has Code
Originality Incremental advance
AI Analysis

This work addresses the challenge of generating accurate natural language descriptions for fine-grained changes in images, which is important for applications like surveillance and robotics, though it is incremental in nature.

The paper tackles the problem of change captioning, which involves describing semantic changes between similar images, by proposing a neighborhood contrastive transformer that improves perception of changes and understanding of syntax, achieving state-of-the-art performance on three public datasets.

Change captioning is to describe the semantic change between a pair of similar images in natural language. It is more challenging than general image captioning, because it requires capturing fine-grained change information while being immune to irrelevant viewpoint changes, and solving syntax ambiguity in change descriptions. In this paper, we propose a neighborhood contrastive transformer to improve the model's perceiving ability for various changes under different scenes and cognition ability for complex syntax structure. Concretely, we first design a neighboring feature aggregating to integrate neighboring context into each feature, which helps quickly locate the inconspicuous changes under the guidance of conspicuous referents. Then, we devise a common feature distilling to compare two images at neighborhood level and extract common properties from each image, so as to learn effective contrastive information between them. Finally, we introduce the explicit dependencies between words to calibrate the transformer decoder, which helps better understand complex syntax structure during training. Extensive experimental results demonstrate that the proposed method achieves the state-of-the-art performance on three public datasets with different change scenarios. The code is available at https://github.com/tuyunbin/NCT.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes