CLAIIRLGMay 13

IndicMedDialog: A Parallel Multi-Turn Medical Dialogue Dataset for Accessible Healthcare in Indic Languages

arXiv:2605.1329213.6
AI Analysis

For healthcare AI researchers and practitioners in India, this work provides a realistic multilingual medical dialogue resource and a fine-tuned model, though the approach is incremental (extending existing datasets and fine-tuning a small model).

The authors introduce IndicMedDialog, a parallel multi-turn medical dialogue dataset covering English and nine Indic languages, and fine-tune a small language model (IndicMedLM) for multilingual symptom elicitation. The model outperforms zero-shot baselines across all languages, with expert evaluation confirming clinical plausibility.

Most existing medical dialogue systems operate in a single-turn question--answering paradigm or rely on template-based datasets, limiting conversational realism and multilingual applicability. We introduce IndicMedDialog, a parallel multi-turn medical dialogue dataset spanning English and nine Indic languages: Assamese, Bengali, Gujarati, Hindi, Marathi, Punjabi, Tamil, Telugu, and Urdu. The dataset extends MDDial with LLM-generated synthetic consultations, translated using TranslateGemma, verified by native speakers, and refined through a script-aware post-processing pipeline to correct phonetic, lexical, and character-spacing errors. Building on this dataset, we fine-tune IndicMedLM via parameter-efficient adaptation of a quantized small language model, incorporating optional patient pre-context to personalise multi-turn symptom elicitation. We evaluate against zero-shot multilingual baselines, conduct systematic error analysis across ten languages, and validate clinical plausibility through medical expert evaluation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes