CLAIDec 18, 2025

Plain language adaptations of biomedical text using LLMs: Comparision of evaluation metrics

arXiv:2512.16530v11 citationsh-index: 11Studies in Health Technology and Informatics
Originality Synthesis-oriented
AI Analysis

This work addresses health literacy by making biomedical information more accessible, though it is incremental as it compares existing LLM approaches on a known task.

The study applied Large Language Models to simplify biomedical texts for better health literacy, finding that gpt-4o-mini outperformed other methods, with G-Eval metrics aligning well with qualitative assessments.

This study investigated the application of Large Language Models (LLMs) for simplifying biomedical texts to enhance health literacy. Using a public dataset, which included plain language adaptations of biomedical abstracts, we developed and evaluated several approaches, specifically a baseline approach using a prompt template, a two AI agent approach, and a fine-tuning approach. We selected OpenAI gpt-4o and gpt-4o mini models as baselines for further research. We evaluated our approaches with quantitative metrics, such as Flesch-Kincaid grade level, SMOG Index, SARI, and BERTScore, G-Eval, as well as with qualitative metric, more precisely 5-point Likert scales for simplicity, accuracy, completeness, brevity. Results showed a superior performance of gpt-4o-mini and an underperformance of FT approaches. G-Eval, a LLM based quantitative metric, showed promising results, ranking the approaches similarly as the qualitative metric.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes