CVCLJan 26, 2023

Style-Aware Contrastive Learning for Multi-Style Image Captioning

arXiv:2301.11367v1277 citationsh-index: 51
Originality Incremental advance
AI Analysis

This work improves multi-style image captioning for applications requiring diverse linguistic outputs, but it is incremental as it builds on existing methods by incorporating style-aware contrastive learning.

The paper tackles the problem of multi-style image captioning by addressing the overlooked relationship between linguistic style and visual content, achieving state-of-the-art performance as demonstrated in experiments.

Existing multi-style image captioning methods show promising results in generating a caption with accurate visual content and desired linguistic style. However, existing methods overlook the relationship between linguistic style and visual content. To overcome this drawback, we propose style-aware contrastive learning for multi-style image captioning. First, we present a style-aware visual encoder with contrastive learning to mine potential visual content relevant to style. Moreover, we propose a style-aware triplet contrast objective to distinguish whether the image, style and caption matched. To provide positive and negative samples for contrastive learning, we present three retrieval schemes: object-based retrieval, RoI-based retrieval and triplet-based retrieval, and design a dynamic trade-off function to calculate retrieval scores. Experimental results demonstrate that our approach achieves state-of-the-art performance. In addition, we conduct an extensive analysis to verify the effectiveness of our method.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes