AICLApr 6, 2024

Soft-Prompting with Graph-of-Thought for Multi-modal Representation Learning

arXiv:2404.04538v188 citationsh-index: 7LREC
Originality Incremental advance
AI Analysis

This addresses the problem of non-linear human thought processes in multi-modal AI for researchers, offering a novel method but likely incremental as it builds on existing chain-of-thought techniques.

The paper tackles the limitation of linear chain-of-thought reasoning in multi-modal tasks by proposing an Aggregation-Graph-of-Thought (AGoT) mechanism for soft-prompt tuning, achieving good results in tasks like text-image retrieval, visual question answering, and image recognition with improved domain generalization.

The chain-of-thought technique has been received well in multi-modal tasks. It is a step-by-step linear reasoning process that adjusts the length of the chain to improve the performance of generated prompts. However, human thought processes are predominantly non-linear, as they encompass multiple aspects simultaneously and employ dynamic adjustment and updating mechanisms. Therefore, we propose a novel Aggregation-Graph-of-Thought (AGoT) mechanism for soft-prompt tuning in multi-modal representation learning. The proposed AGoT models the human thought process not only as a chain but also models each step as a reasoning aggregation graph to cope with the overlooked multiple aspects of thinking in single-step reasoning. This turns the entire reasoning process into prompt aggregation and prompt flow operations. Experiments show that our multi-modal model enhanced with AGoT soft-prompting achieves good results in several tasks such as text-image retrieval, visual question answering, and image recognition. In addition, we demonstrate that it has good domain generalization performance due to better reasoning.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes