CLMay 28

Can LLM Teams Play What? Where? When?

Anastasia Kotelnikova, Viktor Byzov, Maria Dolzhenkova, Evgeny Kotelnikov

arXiv:2605.3045930.6h-index: 2

AI Analysis

This research addresses the limitation of LLMs in tasks requiring indirect reasoning and cultural knowledge, offering a method to improve their performance through team-based interaction for the LLM research community.

This paper investigates whether team-based interaction improves LLM performance in the quiz game What? Where? When? (ChGK), which requires indirect reasoning and cultural knowledge. The study found that team-based strategies, particularly the Talkative Team, significantly outperformed single-model baselines, achieving accuracy gains of up to 20 percentage points and reaching 44.23% accuracy, approaching human team performance on some questions.

Large language models (LLMs) remain limited on tasks requiring indirect reasoning, cultural knowledge, and coordinated hypothesis testing. We investigate whether team-based interaction improves LLM performance in What? Where? When? (ChGK), a quiz game designed to reward collective reasoning. We introduce three team strategies: Voting, Silent Team (the captain observes final answers), and Talkative Team (the captain observes both answers and rationales). To minimize data leakage, we evaluate these strategies on a dataset consisting of 572 ChGK questions released in 2025. Using six recent large-scale open models, we show that team-based strategies outperform single-model baselines, yielding gains of up to 20 percentage points in accuracy. The best team achieves 44.23% accuracy, and approaches human team performance on questions with available human statistics. Analysis of inter-model diversity reveals that disagreement strongly predicts lower accuracy, but explanatory communication substantially mitigates performance drops. We further examine captain behavior and find no evidence of self-preference bias; access to peer rationales improves captain judgments. Overall, LLM teams function primarily as answer selection and error-filtering mechanisms rather than generators of novel solutions. Our findings highlight the importance of interaction and suggest adaptive strategies as a promising direction for multi-agent systems.

View on arXiv PDF

Similar