Talk Less, Interact Better: Evaluating In-context Conversational Adaptation in Multimodal LLMs
This work addresses the problem of evaluating conversational adaptation in MLLMs for researchers in AI and linguistics, showing it is incremental as it highlights a gap in current training regimes without proposing a new solution.
The study introduced ICCA, an automated framework to evaluate whether multimodal large language models (MLLMs) can increase communication efficiency during interactions, similar to humans, and found that while MLLMs can understand more efficient language from others, they do not spontaneously make their own language more efficient over time, with this ability only elicited in some models like GPT-4 through heavy prompting.
Humans spontaneously use increasingly efficient language as interactions progress, by adapting and forming ad-hoc conventions. This phenomenon has been studied extensively using reference games, showing properties of human language that go beyond relaying intents. It remains unexplored whether multimodal large language models (MLLMs) similarly increase communication efficiency during interactions, and what mechanisms they may adopt for this purpose. We introduce ICCA, an automated framework to evaluate such conversational adaptation as an in-context behavior in MLLMs. We evaluate several state-of-the-art MLLMs, and observe that while they may understand the increasingly efficient language of their interlocutor, they do not spontaneously make their own language more efficient over time. This latter ability can only be elicited in some models (e.g., GPT-4) with heavy-handed prompting. This shows that this property of linguistic interaction does not arise from current training regimes, even though it is a common hallmark of human language. ICCA is available at https://github.com/lil-lab/ICCA.