FairyTailor: A Multimodal Generative Framework for Storytelling
This addresses the problem of engaging and creative story generation for users, particularly in children's entertainment, though it is incremental by adding interactivity and multimodality to existing methods.
The authors tackled the challenge of open-ended multimodal storytelling by introducing FairyTailor, a human-in-the-loop framework for co-creating children's fairytales with generated text and retrieved images, resulting in a dynamic tool that enables interactive formation and sharing of stories.
Storytelling is an open-ended task that entails creative thinking and requires a constant flow of ideas. Natural language generation (NLG) for storytelling is especially challenging because it requires the generated text to follow an overall theme while remaining creative and diverse to engage the reader. In this work, we introduce a system and a web-based demo, FairyTailor, for human-in-the-loop visual story co-creation. Users can create a cohesive children's fairytale by weaving generated texts and retrieved images with their input. FairyTailor adds another modality and modifies the text generation process to produce a coherent and creative sequence of text and images. To our knowledge, this is the first dynamic tool for multimodal story generation that allows interactive co-formation of both texts and images. It allows users to give feedback on co-created stories and share their results.