CLAIOct 13, 2022

Dungeons and Dragons as a Dialog Challenge for Artificial Intelligence

arXiv:2210.07109v1302 citationsh-index: 68
Originality Synthesis-oriented
AI Analysis

This work addresses a domain-specific challenge for AI researchers by framing D&D as a testbed for dialogue systems, though it is incremental in applying existing methods to new data.

The paper tackled the problem of using Dungeons and Dragons as a dialogue system challenge, creating a dataset of nearly 900 games with 800,000 dialogue turns and training a large language model to generate next turns and predict game state, with human and automatic evaluations assessing plausibility and state tracking.

AI researchers have posited Dungeons and Dragons (D&D) as a challenge problem to test systems on various language-related capabilities. In this paper, we frame D&D specifically as a dialogue system challenge, where the tasks are to both generate the next conversational turn in the game and predict the state of the game given the dialogue history. We create a gameplay dataset consisting of nearly 900 games, with a total of 7,000 players, 800,000 dialogue turns, 500,000 dice rolls, and 58 million words. We automatically annotate the data with partial state information about the game play. We train a large language model (LM) to generate the next game turn, conditioning it on different information. The LM can respond as a particular character or as the player who runs the game--i.e., the Dungeon Master (DM). It is trained to produce dialogue that is either in-character (roleplaying in the fictional world) or out-of-character (discussing rules or strategy). We perform a human evaluation to determine what factors make the generated output plausible and interesting. We further perform an automatic evaluation to determine how well the model can predict the game state given the history and examine how well tracking the game state improves its ability to produce plausible conversational output.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes