CL AIMay 23, 2025

PerMedCQA: Benchmarking Large Language Models on Medical Consumer Question Answering in Persian Language

Naghmeh Jamali, Milad Mohammadi, Danial Baledi, Zahra Rezvani, Hesham Faili

arXiv:2505.18331v12.71 citationsh-index: 11Has Code

Originality Synthesis-oriented

AI Analysis

This work addresses the problem of limited consumer-oriented medical QA resources for low-resource languages like Persian, providing a benchmark for researchers and developers, though it is incremental as it applies existing methods to a new dataset.

The authors tackled the lack of multilingual medical consumer question answering resources by creating PerMedCQA, a Persian-language benchmark with 68,138 question-answer pairs, and evaluated state-of-the-art LLMs using a novel rubric-based framework, revealing key challenges in multilingual medical QA.

Medical consumer question answering (CQA) is crucial for empowering patients by providing personalized and reliable health information. Despite recent advances in large language models (LLMs) for medical QA, consumer-oriented and multilingual resources, particularly in low-resource languages like Persian, remain sparse. To bridge this gap, we present PerMedCQA, the first Persian-language benchmark for evaluating LLMs on real-world, consumer-generated medical questions. Curated from a large medical QA forum, PerMedCQA contains 68,138 question-answer pairs, refined through careful data cleaning from an initial set of 87,780 raw entries. We evaluate several state-of-the-art multilingual and instruction-tuned LLMs, utilizing MedJudge, a novel rubric-based evaluation framework driven by an LLM grader, validated against expert human annotators. Our results highlight key challenges in multilingual medical QA and provide valuable insights for developing more accurate and context-aware medical assistance systems. The data is publicly available on https://huggingface.co/datasets/NaghmehAI/PerMedCQA

View on arXiv PDF

Similar