CLMar 20, 2023

Multimodal Shannon Game with Images

ETH Zurich
arXiv:2303.11192v21 citationsh-index: 48
Originality Incremental advance
AI Analysis

This work addresses the problem of enhancing language understanding and modeling through multimodal information, but it is incremental as it builds on the classic Shannon game with a new modality.

The authors extended the Shannon game by adding optional image information and found that this multimodal approach improved both confidence and accuracy for human participants and a GPT-2 language model, with nouns and determiners benefiting more and priming effects increasing with context size.

The Shannon game has long been used as a thought experiment in linguistics and NLP, asking participants to guess the next letter in a sentence based on its preceding context. We extend the game by introducing an optional extra modality in the form of image information. To investigate the impact of multimodal information in this game, we use human participants and a language model (LM, GPT-2). We show that the addition of image information improves both self-reported confidence and accuracy for both humans and LM. Certain word classes, such as nouns and determiners, benefit more from the additional modality information. The priming effect in both humans and the LM becomes more apparent as the context size (extra modality information + sentence context) increases. These findings highlight the potential of multimodal information in improving language understanding and modeling.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes