CLMay 29, 2025

Tell, Don't Show: Leveraging Language Models' Abstractive Retellings to Model Literary Themes

Li Lucy, Camilla Griffiths, Sarah Levine, Jennifer L. Eberhardt, Dorottya Demszky, David Bamman

Berkeley

arXiv:2505.23166v16.72 citationsh-index: 12Has CodeACL

Originality Incremental advance

AI Analysis

This addresses the challenge of analyzing themes in literature for researchers and educators, though it is incremental as it builds on existing LDA and LM techniques.

The authors tackled the problem of topic modeling for literary texts, where conventional methods like LDA struggle due to narrative language focusing on sensory details, and proposed Retell, which uses language models to retell passages into abstract concepts before applying LDA, resulting in more precise and informative topics compared to LDA alone or direct LM queries, with validation in a case study on racial/cultural identity in books.

Conventional bag-of-words approaches for topic modeling, like latent Dirichlet allocation (LDA), struggle with literary text. Literature challenges lexical methods because narrative language focuses on immersive sensory details instead of abstractive description or exposition: writers are advised to "show, don't tell." We propose Retell, a simple, accessible topic modeling approach for literature. Here, we prompt resource-efficient, generative language models (LMs) to tell what passages show, thereby translating narratives' surface forms into higher-level concepts and themes. By running LDA on LMs' retellings of passages, we can obtain more precise and informative topics than by running LDA alone or by directly asking LMs to list topics. To investigate the potential of our method for cultural analytics, we compare our method's outputs to expert-guided annotations in a case study on racial/cultural identity in high school English language arts books.

View on arXiv PDF Code

Similar