CLDec 31, 2024

Echoes in AI: Quantifying lack of plot diversity in LLM outputs

arXiv:2501.00273v244 citationsh-index: 42PNAS
Originality Incremental advance
AI Analysis

This addresses a critical problem for creative content generation using LLMs, highlighting an incremental limitation in diversity.

The study quantified the lack of plot diversity in LLM-generated stories by introducing the Sui Generis score, finding that models like GPT-4 and LLaMA-3 frequently echo plot elements across generations, unlike human-written stories.

With rapid advances in large language models (LLMs), there has been an increasing application of LLMs in creative content ideation and generation. A critical question emerges: can current LLMs provide ideas that are diverse enough to truly bolster collective creativity? We examine two state-of-the-art LLMs, GPT-4 and LLaMA-3, on story generation and discover that LLM-generated stories often consist of plot elements that are echoed across a number of generations. To quantify this phenomenon, we introduce the Sui Generis score, an automatic metric that measures the uniqueness of a plot element among alternative storylines generated using the same prompt under an LLM. Evaluating on 100 short stories, we find that LLM-generated stories often contain combinations of idiosyncratic plot elements echoed frequently across generations and across different LLMs, while plots from the original human-written stories are rarely recreated or even echoed in pieces. Moreover, our human evaluation shows that the ranking of Sui Generis scores among story segments correlates moderately with human judgment of surprise level, even though score computation is completely automatic without relying on human judgment.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes