CL AIJan 23

Retell, Reward, Repeat: Reinforcement Learning for Narrative Theory-Informed Story Generation

David Y. Liu, Xanthe Muston, Aditya Joshi, Sebastian Sequoiah-Grayson

arXiv:2601.17226v11 citationsh-index: 3

Originality Incremental advance

AI Analysis

This work addresses the problem of generating more human-like and diverse stories for applications in creative AI, though it is incremental as it builds on existing reinforcement learning methods.

The paper tackled automatic story generation by using reinforcement learning (d-RLAIF) as a post-training alternative to supervised fine-tuning, resulting in stories that were more diverse and aligned with human narrative conventions compared to human-written stories from the TimeTravel dataset.

Despite the subjective nature of storytelling, past works on automatic story generation (ASG) have relied on limited ground truths for training and evaluation. In this work, we explore reinforcement learning (d-RLAIF) as a post-training alternative to supervised fine-tuning (SFT). We first apply Todorov's Theory of Narrative Equilibrium to establish principles that define desirable ASG qualities. We prompt 7B and 14B LLM-as-judge models with our principles to test alignment with human annotators and provide reward signals during d-RLAIF. We use Gemini-3-Flash to evaluate the output of our post-trained models and compare them to human-written stories from the TimeTravel dataset. We show that d-RLAIF offers a viable alternative to supervised fine-tuning (SFT)--producing stories that are more diverse and aligned with human narrative conventions. Our paper demonstrates the promise of reinforcement learning for linguistically grounded post-training for subjective tasks such as ASG.

View on arXiv PDF

Similar