AI CL HC APFeb 6, 2024

Limits of Large Language Models in Debating Humans

James Flamino, Mohammed Shahid Modi, Boleslaw K. Szymanski, Brendan Cross, Colton Mikolajczyk

arXiv:2402.06049v29.67 citationsh-index: 55Has Code

Originality Incremental advance

AI Analysis

This work addresses the viability of using LLMs as artificial partners in sociological experiments, showing incremental progress by identifying specific limitations in human-agent interactions.

The study tested LLM-based agents in debate games with humans, finding that agents improved group productivity but were perceived as less convincing and confident than humans, with measurable behavioral differences.

Large Language Models (LLMs) have shown remarkable promise in communicating with humans. Their potential use as artificial partners with humans in sociological experiments involving conversation is an exciting prospect. But how viable is it? Here, we rigorously test the limits of agents that debate using LLMs in a preregistered study that runs multiple debate-based opinion consensus games. Each game starts with six humans, six agents, or three humans and three agents. We found that agents can blend in and concentrate on a debate's topic better than humans, improving the productivity of all players. Yet, humans perceive agents as less convincing and confident than other humans, and several behavioral metrics of humans and agents we collected deviate measurably from each other. We observed that agents are already decent debaters, but their behavior generates a pattern distinctly different from the human-generated data.

View on arXiv PDF Code

Similar