CLAIMay 16, 2024

Enhancing Semantics in Multimodal Chain of Thought via Soft Negative Sampling

arXiv:2405.09848v183 citationsh-index: 23Has CodeLREC
Originality Incremental advance
AI Analysis

This addresses the issue of improving answer accuracy in multimodal reasoning tasks, but it is incremental as it builds on existing contrastive learning frameworks.

The paper tackled the problem of hallucination in multimodal chain-of-thought reasoning by proposing a method using soft negative sampling (SNSE-CoT) to generate rationales with high textual quality but illogical semantics, and it demonstrated effectiveness on the ScienceQA dataset.

Chain of thought (CoT) has proven useful for problems requiring complex reasoning. Many of these problems are both textual and multimodal. Given the inputs in different modalities, a model generates a rationale and then uses it to answer a question. Because of the hallucination issue, the generated soft negative rationales with high textual quality but illogical semantics do not always help improve answer accuracy. This study proposes a rationale generation method using soft negative sampling (SNSE-CoT) to mitigate hallucinations in multimodal CoT. Five methods were applied to generate soft negative samples that shared highly similar text but had different semantics from the original. Bidirectional margin loss (BML) was applied to introduce them into the traditional contrastive learning framework that involves only positive and negative samples. Extensive experiments on the ScienceQA dataset demonstrated the effectiveness of the proposed method. Code and data are released at https://github.com/zgMin/SNSE-CoT.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes