CL AIDec 18, 2025

Plain language adaptations of biomedical text using LLMs: Comparision of evaluation metrics

Primoz Kocbek, Leon Kopitar, Gregor Stiglic

arXiv:2512.16530v14.91 citationsh-index: 11Studies in Health Technology and Informatics

Originality Synthesis-oriented

AI Analysis

This work addresses health literacy by making biomedical information more accessible, though it is incremental as it compares existing LLM approaches on a known task.

The study applied Large Language Models to simplify biomedical texts for better health literacy, finding that gpt-4o-mini outperformed other methods, with G-Eval metrics aligning well with qualitative assessments.

This study investigated the application of Large Language Models (LLMs) for simplifying biomedical texts to enhance health literacy. Using a public dataset, which included plain language adaptations of biomedical abstracts, we developed and evaluated several approaches, specifically a baseline approach using a prompt template, a two AI agent approach, and a fine-tuning approach. We selected OpenAI gpt-4o and gpt-4o mini models as baselines for further research. We evaluated our approaches with quantitative metrics, such as Flesch-Kincaid grade level, SMOG Index, SARI, and BERTScore, G-Eval, as well as with qualitative metric, more precisely 5-point Likert scales for simplicity, accuracy, completeness, brevity. Results showed a superior performance of gpt-4o-mini and an underperformance of FT approaches. G-Eval, a LLM based quantitative metric, showed promising results, ranking the approaches similarly as the qualitative metric.

View on arXiv PDF

Similar