CLJul 21, 2025

Help Me Write a Story: Evaluating LLMs' Ability to Generate Writing Feedback

Hannah Rashkin, Elizabeth Clark, Fantine Huot, Mirella Lapata

arXiv:2507.16007v113.910 citationsh-index: 86ACL

Originality Incremental advance

AI Analysis

This addresses the challenge of using LLMs to support creative writers, though it is incremental as it builds on existing evaluation methods.

The paper tackled the problem of evaluating LLMs' ability to generate meaningful writing feedback for creative writers by introducing a new task, dataset, and evaluation frameworks, finding that models provide specific and mostly accurate feedback but often fail to identify the biggest issues and balance critical vs. positive feedback.

Can LLMs provide support to creative writers by giving meaningful writing feedback? In this paper, we explore the challenges and limitations of model-generated writing feedback by defining a new task, dataset, and evaluation frameworks. To study model performance in a controlled manner, we present a novel test set of 1,300 stories that we corrupted to intentionally introduce writing issues. We study the performance of commonly used LLMs in this task with both automatic and human evaluation metrics. Our analysis shows that current models have strong out-of-the-box behavior in many respects -- providing specific and mostly accurate writing feedback. However, models often fail to identify the biggest writing issue in the story and to correctly decide when to offer critical vs. positive feedback.

View on arXiv PDF

Similar