CVAICLLGDec 11, 2023

Honeybee: Locality-enhanced Projector for Multimodal LLM

arXiv:2312.06742v2236 citationsh-index: 13Has CodeCVPR
Originality Incremental advance
AI Analysis

This work addresses a key bottleneck in MLLMs for improving visual understanding and efficiency, representing an incremental advancement in projector design.

The paper tackled the problem of designing an effective visual projector for Multimodal Large Language Models (MLLMs) by proposing a flexible and locality-enhanced projector, which outperformed previous state-of-the-art methods across multiple benchmarks like MME and MMBench with significantly higher efficiency.

In Multimodal Large Language Models (MLLMs), a visual projector plays a crucial role in bridging pre-trained vision encoders with LLMs, enabling profound visual understanding while harnessing the LLMs' robust capabilities. Despite the importance of the visual projector, it has been relatively less explored. In this study, we first identify two essential projector properties: (i) flexibility in managing the number of visual tokens, crucial for MLLMs' overall efficiency, and (ii) preservation of local context from visual features, vital for spatial understanding. Based on these findings, we propose a novel projector design that is both flexible and locality-enhanced, effectively satisfying the two desirable properties. Additionally, we present comprehensive strategies to effectively utilize multiple and multifaceted instruction datasets. Through extensive experiments, we examine the impact of individual design choices. Finally, our proposed MLLM, Honeybee, remarkably outperforms previous state-of-the-art methods across various benchmarks, including MME, MMBench, SEED-Bench, and LLaVA-Bench, achieving significantly higher efficiency. Code and models are available at https://github.com/kakaobrain/honeybee.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes