CLMay 23

HiMed: Incentivizing Hindi Reasoning in Medical LLMs

arXiv:2605.2463586.2Has Code
AI Analysis

For Hindi-speaking populations and healthcare AI, this work tackles the language disparity in medical LLMs, though the approach is incremental.

HiMed addresses the severe underrepresentation of Hindi in medical LLMs by introducing a Hindi reasoning medical corpus and benchmark, and proposes HiMed-8B using decaying scaffolding reward, which improves Hindi medical reasoning and reduces the English-Hindi accuracy gap.

Medical large language models hold promise for reducing healthcare disparities, yet Hindi remains severely underrepresented. While medical LLMs excel in high-resource languages, their performance degrades sharply in Hindi, particularly on Indian systems of medicine. We argue that robust cross-lingual medical transfer requires Hindi reasoning. To this end, we introduce HiMed, a Hindi reasoning medical corpus and benchmark suite covering both Western and Indian medicine. We further propose HiMed-8B, a Hindi-form medical reasoning LLM, through the design of decaying scaffolding reward. Extensive experiments demonstrate improvement in Hindi medical reasoning performance and reduction in the English--Hindi accuracy gap. Ablation studies validate the contribution of each training stage and reward component. All data and code are available on GitHub: https://github.com/FreedomIntelligence/HiMed.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes