CL LGSep 17, 2024

Says Who? Effective Zero-Shot Annotation of Focalization

Rebecca M. M. Hicke, Yuri Bizzoni, Pascale Feldkamp, Ross Deans Kristensen-McLachlan

arXiv:2409.11390v31.93 citationsh-index: 9

Originality Synthesis-oriented

AI Analysis

This work addresses a computationally difficult problem in computational literary studies by enabling effective zero-shot annotation, though it is incremental as it applies existing LLM methods to a new domain.

The paper tackled the challenging task of annotating focalization in literary texts, finding that large language models, particularly GPT-4o, achieve comparable performance to trained human annotators with an average F1 score of 84.79%.

Focalization describes the way in which access to narrative information is restricted or controlled based on the knowledge available to knowledge of the narrator. It is encoded via a wide range of lexico-grammatical features and is subject to reader interpretation. Even trained annotators frequently disagree on correct labels, suggesting this task is both qualitatively and computationally challenging. In this work, we test how well five contemporary large language model (LLM) families and two baselines perform when annotating short literary excerpts for focalization. Despite the challenging nature of the task, we find that LLMs show comparable performance to trained human annotators, with GPT-4o achieving an average F1 of 84.79%. Further, we demonstrate that the log probabilities output by GPT-family models frequently reflect the difficulty of annotating particular excerpts. Finally, we provide a case study analyzing sixteen Stephen King novels, demonstrating the usefulness of this approach for computational literary studies and the insights gleaned from examining focalization at scale.

View on arXiv PDF

Similar