Beneath the Surface: Investigating LLMs' Capabilities for Communicating with Subtext
This addresses the problem of LLMs' limited ability to handle nuanced, creative communication like humans, which is important for AI-human interaction, though it is incremental in providing new evaluation methods rather than solutions.
The researchers investigated whether language models can use subtext in communication, finding that frontier models exhibit a strong bias toward overly literal communication (generating literal clues 60% of the time in one environment) but can sometimes leverage common ground to reduce literal clues by 30%-50%.
Human communication is fundamentally creative, and often makes use of subtext -- implied meaning that goes beyond the literal content of the text. Here, we systematically study whether language models can use subtext in communicative settings, and introduce four new evaluation suites to assess these capabilities. Our evaluation settings range from writing & interpreting allegories to playing multi-agent and multi-modal games inspired by the rules of board games like Dixit. We find that frontier models generally exhibit a strong bias towards overly literal, explicit communication, and thereby fail to account for nuanced constraints -- even the best performing models generate literal clues 60% of times in one of our environments -- Visual Allusions. However, we find that some models can sometimes make use of common ground with another party to help them communicate with subtext, achieving 30%-50% reduction in overly literal clues; but they struggle at inferring presence of a common ground when not explicitly stated. For allegory understanding, we find paratextual and persona conditions to significantly shift the interpretation of subtext. Overall, our work provides quantifiable measures for an inherently complex and subjective phenomenon like subtext and reveals many weaknesses and idiosyncrasies of current LLMs. We hope this research to inspire future work towards socially grounded creative communication and reasoning.