Eric Yuan

AIJul 8, 2022

Automatic Exploration of Textual Environments with Language-Conditioned Autotelic Agents

Laetitia Teodorescu, Eric Yuan, Marc-Alexandre Côté et al. · microsoft-research

In this extended abstract we discuss the opportunities and challenges of studying intrinsically-motivated agents for exploration in textual environments. We argue that there is important synergy between text environments and autonomous agents. We identify key properties of text worlds that make them suitable for exploration by autonmous agents, namely, depth, breadth, progress niches and the ease of use of language goals; we identify drivers of exploration for such agents that are implementable in text worlds. We discuss the opportunities of using autonomous agents to make progress on text environment benchmarks. Finally we list some specific challenges that need to be overcome in this area.

CLMay 24, 2023Code

ByteSized32: A Corpus and Challenge Task for Generating Task-Specific World Models Expressed as Text Games

Ruoyao Wang, Graham Todd, Eric Yuan et al.

In this work, we investigate the capacity of language models to generate explicit, interpretable, and interactive world models of scientific and common-sense reasoning tasks. We operationalize this as a task of generating text games, expressed as hundreds of lines of Python code. To facilitate this task, we introduce ByteSized32 (Code: github.com/cognitiveailab/BYTESIZED32), a corpus of 32 reasoning-focused text games totaling 20k lines of Python code. We empirically demonstrate that GPT-4 can use these games as templates for single-shot in-context learning, successfully producing runnable games on unseen topics in 28% of cases. When allowed to self-reflect on program errors, game runnability substantially increases to 57%. While evaluating simulation fidelity is labor-intensive, we introduce a suite of automated metrics to assess game fidelity, technical validity, adherence to task specifications, and winnability, showing a high degree of agreement with expert human ratings. We pose this as a challenge task to spur further development at the juncture of world modeling and code generation.

Eric Yuan

2 Papers