CLAILGMAApr 15, 2025

TextArena

IBMPeking U
arXiv:2504.11442v216 citationsh-index: 30Has Code
Originality Synthesis-oriented
AI Analysis

This addresses a gap in evaluating agentic behavior in LLMs for researchers and developers, though it is incremental as it builds on existing game-based frameworks.

The authors tackled the lack of benchmarks for dynamic social skills like negotiation and deception in LLMs by introducing TextArena, an open-source collection of 57+ text-based games, which enables evaluation via an online-play system with real-time TrueSkill scores.

TextArena is an open-source collection of competitive text-based games for training and evaluation of agentic behavior in Large Language Models (LLMs). It spans 57+ unique environments (including single-player, two-player, and multi-player setups) and allows for easy evaluation of model capabilities via an online-play system (against humans and other submitted models) with real-time TrueSkill scores. Traditional benchmarks rarely assess dynamic social skills such as negotiation, theory of mind, and deception, creating a gap that TextArena addresses. Designed with research, community and extensibility in mind, TextArena emphasizes ease of adding new games, adapting the framework, testing models, playing against the models, and training models. Detailed documentation of environments, games, leaderboard, and examples are available on https://github.com/LeonGuertler/TextArena and https://www.textarena.ai/.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes