CLSep 30, 2024

Analysing Zero-Shot Readability-Controlled Sentence Simplification

Abdullah Barayan, Jose Camacho-Collados, Fernando Alva-Manchego

arXiv:2409.20246v214.925 citationsh-index: 40

Originality Synthesis-oriented

AI Analysis

This addresses the problem of data scarcity in readability-controlled text simplification for NLP researchers, but it is incremental as it explores existing models without introducing a new method.

The paper tackled zero-shot readability-controlled sentence simplification using instruction-tuned large language models to reduce reliance on scarce parallel data, finding that models struggle to simplify sentences to the lowest readability levels and highlighting issues with standard automatic evaluation metrics.

Readability-controlled text simplification (RCTS) rewrites texts to lower readability levels while preserving their meaning. RCTS models often depend on parallel corpora with readability annotations on both source and target sides. Such datasets are scarce and difficult to curate, especially at the sentence level. To reduce reliance on parallel data, we explore using instruction-tuned large language models for zero-shot RCTS. Through automatic and manual evaluations, we examine: (1) how different types of contextual information affect a model's ability to generate sentences with the desired readability, and (2) the trade-off between achieving target readability and preserving meaning. Results show that all tested models struggle to simplify sentences (especially to the lowest levels) due to models' limitations and characteristics of the source sentences that impede adequate rewriting. Our experiments also highlight the need for better automatic evaluation metrics tailored to RCTS, as standard ones often misinterpret common simplification operations, and inaccurately assess readability and meaning preservation.

View on arXiv PDF

Similar