GN AI CY HC LGApr 25, 2025

Artificial Intelligence health advice accuracy varies across languages and contexts

arXiv:2504.18310v11 citationsh-index: 26

Originality Incremental advance

AI Analysis

This highlights the need for multilingual, domain-aware validation before deploying AI in global health communication, addressing a critical gap for global public health applications.

The study benchmarked six leading large language models on 9,100 health statements across 21 languages, finding that while accuracy is high for English-centric claims, it drops significantly in non-European languages and varies by topic and source.

Using basic health statements authorized by UK and EU registers and 9,100 journalist-vetted public-health assertions on topics such as abortion, COVID-19 and politics from sources ranging from peer-reviewed journals and government advisories to social media and news across the political spectrum, we benchmark six leading large language models from in 21 languages, finding that, despite high accuracy on English-centric textbook claims, performance falls in multiple non-European languages and fluctuates by topic and source, highlighting the urgency of comprehensive multilingual, domain-aware validation before deploying AI in global health communication.

View on arXiv PDF

Similar