AICLApr 19, 2025

TALES: Text Adventure Learning Environment Suite

Microsoft
arXiv:2504.14128v410 citationsh-index: 26
Originality Synthesis-oriented
AI Analysis

This addresses the need for better reasoning benchmarks in AI, but it is incremental as it focuses on evaluation rather than new methods.

The authors tackled the problem of evaluating reasoning in large language models by introducing TALES, a suite of text-adventure games, and found that while models performed well on synthetic games, they failed to achieve 15% success on human-designed games.

Reasoning is an essential skill to enable Large Language Models (LLMs) to interact with the world. As tasks become more complex, they demand increasingly sophisticated and diverse reasoning capabilities for sequential decision-making, requiring structured reasoning over the context history to determine the next best action. We introduce TALES, a diverse collection of synthetic and human-written text-adventure games designed to challenge and evaluate diverse reasoning capabilities. We present results over a range of LLMs, open- and closed-weights, performing a qualitative analysis on the top performing models. Despite an impressive showing on synthetic games, even the top LLM-driven agents fail to achieve 15% on games designed for human enjoyment. Code and visualization of the experiments can be found at https://microsoft.github.io/tale-suite.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes