HC AI CLJan 23, 2025

Toyteller: AI-powered Visual Storytelling Through Toy-Playing with Character Symbols

John Joon Young Chung, Melissa Roemmele, Max Kreminski

arXiv:2501.13284v117.014 citationsh-index: 7CHI

Originality Incremental advance

AI Analysis

This work addresses the problem of enhancing human-AI interaction for storytelling by combining toy-playing motions with language, though it is incremental in integrating existing models.

The researchers tackled the challenge of generating visual stories by developing Toyteller, an AI system that uses character symbol motions to steer text and visual outputs, which outperformed GPT-4o in evaluations and helped users express intentions hard to verbalize.

We introduce Toyteller, an AI-powered storytelling system where users generate a mix of story text and visuals by directly manipulating character symbols like they are toy-playing. Anthropomorphized symbol motions can convey rich and nuanced social interactions; Toyteller leverages these motions (1) to let users steer story text generation and (2) as a visual output format that accompanies story text. We enabled motion-steered text generation and text-steered motion generation by mapping motions and text onto a shared semantic space so that large language models and motion generation models can use it as a translational layer. Technical evaluations showed that Toyteller outperforms a competitive baseline, GPT-4o. Our user study identified that toy-playing helps express intentions difficult to verbalize. However, only motions could not express all user intentions, suggesting combining it with other modalities like language. We discuss the design space of toy-playing interactions and implications for technical HCI research on human-AI interaction.

View on arXiv PDF

Similar