Assessing Language Models' Worldview for Fiction Generation
This work identifies a key limitation in current LLMs for computational creativity applications like fiction writing, highlighting their lack of consistent worldview, which is incremental as it builds on existing evaluations of LLM reliability.
The study assessed the suitability of Large Language Models (LLMs) for generating fiction by testing their ability to maintain a consistent worldview, finding that only two out of nine models exhibited consistency while the rest were self-conflicting, and analysis of stories from four models revealed a uniform narrative pattern.
The use of Large Language Models (LLMs) has become ubiquitous, with abundant applications in computational creativity. One such application is fictional story generation. Fiction is a narrative that occurs in a story world that is slightly different than ours. With LLMs becoming writing partners, we question how suitable they are to generate fiction. This study investigates the ability of LLMs to maintain a state of world essential to generate fiction. Through a series of questions to nine LLMs, we find that only two models exhibit consistent worldview, while the rest are self-conflicting. Subsequent analysis of stories generated by four models revealed a strikingly uniform narrative pattern. This uniformity across models further suggests a lack of `state' necessary for fiction. We highlight the limitations of current LLMs in fiction writing and advocate for future research to test and create story worlds for LLMs to reside in. All code, dataset, and the generated responses can be found in https://github.com/tanny411/llm-reliability-and-consistency-evaluation.