CLMay 21, 2023

Multilingual Simplification of Medical Texts

arXiv:2305.12532v4139 citations
Originality Synthesis-oriented
AI Analysis

This addresses the problem of limited access to medical information for non-English speakers, though it is incremental as it extends existing monolingual approaches to multiple languages.

The paper tackled the lack of multilingual resources for medical text simplification by introducing MultiCochrane, a sentence-aligned dataset in English, Spanish, French, and Farsi, and evaluated models that generated viable simplified texts across these languages.

Automated text simplification aims to produce simple versions of complex texts. This task is especially useful in the medical domain, where the latest medical findings are typically communicated via complex and technical articles. This creates barriers for laypeople seeking access to up-to-date medical findings, consequently impeding progress on health literacy. Most existing work on medical text simplification has focused on monolingual settings, with the result that such evidence would be available only in just one language (most often, English). This work addresses this limitation via multilingual simplification, i.e., directly simplifying complex texts into simplified texts in multiple languages. We introduce MultiCochrane, the first sentence-aligned multilingual text simplification dataset for the medical domain in four languages: English, Spanish, French, and Farsi. We evaluate fine-tuned and zero-shot models across these languages, with extensive human assessments and analyses. Although models can now generate viable simplified texts, we identify outstanding challenges that this dataset might be used to address.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes