The Next Chapter: A Study of Large Language Models in Storytelling
This addresses the challenge of generating high-quality stories for NLP applications, but it is incremental as it applies existing LLMs to a specific domain.
The paper tackled the problem of story generation quality by comparing large language models (LLMs) like GPT-3 to recent models across three datasets, finding that LLMs produce significantly higher-quality stories and compete with human authors, though they sometimes replicate real stories in a plagiarism-like manner.
To enhance the quality of generated stories, recent story generation models have been investigating the utilization of higher-level attributes like plots or commonsense knowledge. The application of prompt-based learning with large language models (LLMs), exemplified by GPT-3, has exhibited remarkable performance in diverse natural language processing (NLP) tasks. This paper conducts a comprehensive investigation, utilizing both automatic and human evaluation, to compare the story generation capacity of LLMs with recent models across three datasets with variations in style, register, and length of stories. The results demonstrate that LLMs generate stories of significantly higher quality compared to other story generation models. Moreover, they exhibit a level of performance that competes with human authors, albeit with the preliminary observation that they tend to replicate real stories in situations involving world knowledge, resembling a form of plagiarism.