CLOct 20, 2025

Does Reasoning Help LLM Agents Play Dungeons and Dragons? A Prompt Engineering Experiment

Patricia Delafuente, Arya Honraopatil, Lara J. Martin

arXiv:2510.18112v12.7

Originality Synthesis-oriented

AI Analysis

This is an incremental study for DnD players and developers using LLMs for game automation.

The paper tackled the problem of generating Dungeons & Dragons player actions as Discord bot commands using LLMs, finding that instruct models like LLaMA-3.1-8B-Instruct are sufficient compared to reasoning models, with prompt engineering significantly affecting output.

This paper explores the application of Large Language Models (LLMs) and reasoning to predict Dungeons & Dragons (DnD) player actions and format them as Avrae Discord bot commands. Using the FIREBALL dataset, we evaluated a reasoning model, DeepSeek-R1-Distill-LLaMA-8B, and an instruct model, LLaMA-3.1-8B-Instruct, for command generation. Our findings highlight the importance of providing specific instructions to models, that even single sentence changes in prompts can greatly affect the output of models, and that instruct models are sufficient for this task compared to reasoning models.

View on arXiv PDF

Similar