HCAICLJan 23, 2025

Toyteller: AI-powered Visual Storytelling Through Toy-Playing with Character Symbols

arXiv:2501.13284v114 citationsh-index: 7CHI
Originality Incremental advance
AI Analysis

This work addresses the problem of enhancing human-AI interaction for storytelling by combining toy-playing motions with language, though it is incremental in integrating existing models.

The researchers tackled the challenge of generating visual stories by developing Toyteller, an AI system that uses character symbol motions to steer text and visual outputs, which outperformed GPT-4o in evaluations and helped users express intentions hard to verbalize.

We introduce Toyteller, an AI-powered storytelling system where users generate a mix of story text and visuals by directly manipulating character symbols like they are toy-playing. Anthropomorphized symbol motions can convey rich and nuanced social interactions; Toyteller leverages these motions (1) to let users steer story text generation and (2) as a visual output format that accompanies story text. We enabled motion-steered text generation and text-steered motion generation by mapping motions and text onto a shared semantic space so that large language models and motion generation models can use it as a translational layer. Technical evaluations showed that Toyteller outperforms a competitive baseline, GPT-4o. Our user study identified that toy-playing helps express intentions difficult to verbalize. However, only motions could not express all user intentions, suggesting combining it with other modalities like language. We discuss the design space of toy-playing interactions and implications for technical HCI research on human-AI interaction.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes