LLMs as Agentic Cooperative Players in Multiplayer UNO
This work addresses the incremental challenge of evaluating LLMs as cooperative assistants in specific, interactive tasks like games.
The researchers tackled the problem of whether LLMs can actively assist humans by testing them as cooperative agents in the game UNO, finding that while all models outperformed a random baseline, few significantly helped another player win.
LLMs promise to assist humans -- not just by answering questions, but by offering useful guidance across a wide range of tasks. But how far does that assistance go? Can a large language model based agent actually help someone accomplish their goal as an active participant? We test this question by engaging an LLM in UNO, a turn-based card game, asking it not to win but instead help another player to do so. We built a tool that allows decoder-only LLMs to participate as agents within the RLCard game environment. These models receive full game-state information and respond using simple text prompts under two distinct prompting strategies. We evaluate models ranging from small (1B parameters) to large (70B parameters) and explore how model scale impacts performance. We find that while all models were able to successfully outperform a random baseline when playing UNO, few were able to significantly aid another player.