CL LGFeb 5, 2025

Looking for the Inner Music: Probing LLMs' Understanding of Literary Style

arXiv:2502.03647v14.92 citationsh-index: 14Computational Humanities Research

Originality Synthesis-oriented

AI Analysis

This work addresses the problem of interpreting LLMs' stylistic capabilities for researchers in computational linguistics, but it is incremental as it builds on prior stylometry research.

The study investigated how large language models (LLMs) understand literary style by replicating authorship identification and extending it to genre classification, finding that models use different strategies like memorization or learning, with authorial style being easier to define and more affected by syntax and context than genre-level style.

Recent work has demonstrated that language models can be trained to identify the author of much shorter literary passages than has been thought feasible for traditional stylometry. We replicate these results for authorship and extend them to a new dataset measuring novel genre. We find that LLMs are able to distinguish authorship and genre, but they do so in different ways. Some models seem to rely more on memorization, while others benefit more from training to learn author/genre characteristics. We then use three methods to probe one high-performing LLM for features that define style. These include direct syntactic ablations to input text as well as two methods that look at model internals. We find that authorial style is easier to define than genre-level style and is more impacted by minor syntactic decisions and contextual word usage. However, some traits like pronoun usage and word order prove significant for defining both kinds of literary style.

View on arXiv PDF

Similar