CL AIJan 13

STAGE: A Benchmark for Knowledge Graph Construction, Question Answering, and In-Script Role-Playing over Movie Screenplays

Qiuyu Tian, Yiding Li, Fengyi Chen, Zequn Liu, Youyong Kong, Fan Guo, Yuyao Li, Jinjing Shen, Zhijing Xie, Yiyun Luo, Xin Zhang

arXiv:2601.08510v11 citations

Originality Incremental advance

AI Analysis

This provides a holistic evaluation framework for narrative understanding in AI, addressing a gap in prior benchmarks that focused on individual subtasks.

The authors tackled the problem of evaluating models' ability to construct coherent story worlds from movie screenplays and use them consistently across multiple reasoning and generation tasks, by introducing STAGE, a unified benchmark with four tasks (knowledge graph construction, event summarization, question answering, and role-playing) grounded in shared narrative representations for 150 films in English and Chinese.

Movie screenplays are rich long-form narratives that interleave complex character relationships, temporally ordered events, and dialogue-driven interactions. While prior benchmarks target individual subtasks such as question answering or dialogue generation, they rarely evaluate whether models can construct a coherent story world and use it consistently across multiple forms of reasoning and generation. We introduce STAGE (Screenplay Text, Agents, Graphs and Evaluation), a unified benchmark for narrative understanding over full-length movie screenplays. STAGE defines four tasks: knowledge graph construction, scene-level event summarization, long-context screenplay question answering, and in-script character role-playing, all grounded in a shared narrative world representation. The benchmark provides cleaned scripts, curated knowledge graphs, and event- and character-centric annotations for 150 films across English and Chinese, enabling holistic evaluation of models' abilities to build world representations, abstract and verify narrative events, reason over long narratives, and generate character-consistent responses.

View on arXiv PDF

Similar