CLMay 29, 2025

Tell, Don't Show: Leveraging Language Models' Abstractive Retellings to Model Literary Themes

Berkeley
arXiv:2505.23166v12 citationsh-index: 12Has CodeACL
Originality Incremental advance
AI Analysis

This addresses the challenge of analyzing themes in literature for researchers and educators, though it is incremental as it builds on existing LDA and LM techniques.

The authors tackled the problem of topic modeling for literary texts, where conventional methods like LDA struggle due to narrative language focusing on sensory details, and proposed Retell, which uses language models to retell passages into abstract concepts before applying LDA, resulting in more precise and informative topics compared to LDA alone or direct LM queries, with validation in a case study on racial/cultural identity in books.

Conventional bag-of-words approaches for topic modeling, like latent Dirichlet allocation (LDA), struggle with literary text. Literature challenges lexical methods because narrative language focuses on immersive sensory details instead of abstractive description or exposition: writers are advised to "show, don't tell." We propose Retell, a simple, accessible topic modeling approach for literature. Here, we prompt resource-efficient, generative language models (LMs) to tell what passages show, thereby translating narratives' surface forms into higher-level concepts and themes. By running LDA on LMs' retellings of passages, we can obtain more precise and informative topics than by running LDA alone or by directly asking LMs to list topics. To investigate the potential of our method for cultural analytics, we compare our method's outputs to expert-guided annotations in a case study on racial/cultural identity in high school English language arts books.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes