CLJan 7

Evaluation Framework for AI Creativity: A Case Study Based on Story Generation

Pharath Sathya, Yin Jou Huang, Fei Cheng

arXiv:2601.03698v10.6h-index: 15

Originality Incremental advance

AI Analysis

This addresses the problem of subjective creativity assessment in AI story generation for researchers and practitioners, but it is incremental as it builds on existing evaluation methods.

The paper tackled the challenge of evaluating creative text generation by proposing a structured framework with four components and eleven sub-components, using a crowdsourced study of 115 readers to show that creativity is evaluated hierarchically and that reflective evaluation alters ratings and agreement.

Evaluating creative text generation remains a challenge because existing reference-based metrics fail to capture the subjective nature of creativity. We propose a structured evaluation framework for AI story generation comprising four components (Novelty, Value, Adherence, and Resonance) and eleven sub-components. Using controlled story generation via ``Spike Prompting'' and a crowdsourced study of 115 readers, we examine how different creative components shape both immediate and reflective human creativity judgments. Our findings show that creativity is evaluated hierarchically rather than cumulatively, with different dimensions becoming salient at different stages of judgment, and that reflective evaluation substantially alters both ratings and inter-rater agreement. Together, these results support the effectiveness of our framework in revealing dimensions of creativity that are obscured by reference-based evaluation.

View on arXiv PDF

Similar