CLOct 17, 2024

Measuring and Modifying the Readability of English Texts with GPT-4

arXiv:2410.14028v127 citationsh-index: 3Proceedings of the Third Workshop on Text Simplification, Accessibility and Readability (TSAR 2024)
Originality Incremental advance
AI Analysis

This addresses the problem of automated readability assessment and modification for users like educators or content creators, but it is incremental as it builds on existing LLM capabilities.

The study tackled the problem of whether large language models (LLMs) can reliably assess and modify text readability, finding that GPT-4 models achieved high correlation with human judgments (r = 0.76 and 0.74) and could manipulate readability in a human experiment, though with unexplained variance.

The success of Large Language Models (LLMs) in other domains has raised the question of whether LLMs can reliably assess and manipulate the readability of text. We approach this question empirically. First, using a published corpus of 4,724 English text excerpts, we find that readability estimates produced ``zero-shot'' from GPT-4 Turbo and GPT-4o mini exhibit relatively high correlation with human judgments (r = 0.76 and r = 0.74, respectively), out-performing estimates derived from traditional readability formulas and various psycholinguistic indices. Then, in a pre-registered human experiment (N = 59), we ask whether Turbo can reliably make text easier or harder to read. We find evidence to support this hypothesis, though considerable variance in human judgments remains unexplained. We conclude by discussing the limitations of this approach, including limited scope, as well as the validity of the ``readability'' construct and its dependence on context, audience, and goal.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes