CVMMNov 25, 2023

Incorporating granularity bias as the margin into contrastive loss for video captioning

arXiv:2311.14977v11.5h-index: 1
Originality Incremental advance
AI Analysis

This addresses the issue of biased caption generation in video understanding for AI applications, offering an incremental improvement over existing debiasing methods.

The paper tackled the problem of granularity bias in video captioning, where models generate vague sentences due to long-tail phrase distributions, by introducing a statistical bias extractor and incorporating it into a contrastive loss, achieving state-of-the-art performance with CIDEr scores of 57.17 on MSRVTT and 138.68 on MSVD.

Video captioning models easily suffer from long-tail distribution of phrases, which makes captioning models prone to generate vague sentences instead of accurate ones. However, existing debiasing strategies tend to export external knowledge to build dependency trees of words or refine frequency distribution by complex losses and extra input features, which lack interpretability and are hard to train. To mitigate the impact of granularity bias on the model, we introduced a statistical-based bias extractor. This extractor quantifies the information content within sentences and videos, providing an estimate of the likelihood that a video-sentence pair is affected by granularity bias. Furthermore, with the growing trend of integrating contrastive learning methods into video captioning tasks, we use a bidirectional triplet loss to get more negative samples in a batch. Subsequently, we incorporate the margin score into the contrastive learning loss, establishing distinct training objectives for head and tail sentences. This approach facilitates the model's training effectiveness on tail samples. Our simple yet effective loss, incorporating Granularity bias, is referred to as the Margin-Contrastive Loss (GMC Loss). The proposed model demonstrates state-of-the-art performance on MSRVTT with a CIDEr of 57.17, and MSVD, where CIDEr reaches up to 138.68.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes