CLSep 14, 2025

!MSA at AraHealthQA 2025 Shared Task: Enhancing LLM Performance for Arabic Clinical Question Answering through Prompt Engineering and Ensemble Learning

arXiv:2509.11365v12 citationsh-index: 7Proceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of improving question answering accuracy in Arabic clinical contexts for healthcare and NLP applications, but it is incremental as it builds on existing LLMs with tailored prompts.

The paper tackled Arabic clinical question answering by enhancing LLM performance through prompt engineering and ensemble learning, achieving 2nd place in both multiple-choice and open-ended tasks at the AraHealthQA 2025 shared task.

We present our systems for Track 2 (General Arabic Health QA, MedArabiQ) of the AraHealthQA-2025 shared task, where our methodology secured 2nd place in both Sub-Task 1 (multiple-choice question answering) and Sub-Task 2 (open-ended question answering) in Arabic clinical contexts. For Sub-Task 1, we leverage the Gemini 2.5 Flash model with few-shot prompting, dataset preprocessing, and an ensemble of three prompt configurations to improve classification accuracy on standard, biased, and fill-in-the-blank questions. For Sub-Task 2, we employ a unified prompt with the same model, incorporating role-playing as an Arabic medical expert, few-shot examples, and post-processing to generate concise responses across fill-in-the-blank, patient-doctor Q&A, GEC, and paraphrased variants.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes