CVCLAug 15, 2023

Link-Context Learning for Multimodal LLMs

arXiv:2308.07891v129 citationsh-index: 24Has Code
Originality Incremental advance
AI Analysis

This addresses the problem of improving training-free generalization for MLLMs, which is incremental as it builds on existing in-context learning methods.

The paper tackles the challenge of multimodal large language models (MLLMs) recognizing unseen images and understanding novel concepts without training, by proposing link-context learning (LCL) to enhance causal reasoning in in-context learning, resulting in strong capabilities over vanilla MLLMs as demonstrated on the ISEKAI dataset.

The ability to learn from context with novel concepts, and deliver appropriate responses are essential in human conversations. Despite current Multimodal Large Language Models (MLLMs) and Large Language Models (LLMs) being trained on mega-scale datasets, recognizing unseen images or understanding novel concepts in a training-free manner remains a challenge. In-Context Learning (ICL) explores training-free few-shot learning, where models are encouraged to ``learn to learn" from limited tasks and generalize to unseen tasks. In this work, we propose link-context learning (LCL), which emphasizes "reasoning from cause and effect" to augment the learning capabilities of MLLMs. LCL goes beyond traditional ICL by explicitly strengthening the causal relationship between the support set and the query set. By providing demonstrations with causal links, LCL guides the model to discern not only the analogy but also the underlying causal associations between data points, which empowers MLLMs to recognize unseen images and understand novel concepts more effectively. To facilitate the evaluation of this novel approach, we introduce the ISEKAI dataset, comprising exclusively of unseen generated image-label pairs designed for link-context learning. Extensive experiments show that our LCL-MLLM exhibits strong link-context learning capabilities to novel concepts over vanilla MLLMs. Code and data will be released at https://github.com/isekai-portal/Link-Context-Learning.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes