Implicit Geographic Inference in LLM Medical Triage: Language-Driven Disparities in Emergency Recommendations

arXiv:2606.0120461.0

Predicted impact top 71% in CL · last 90 daysOriginality Incremental advance

AI Analysis

This reveals a language-driven bias in LLM medical triage that could lead to disparities in emergency care recommendations for non-English speakers.

LLMs produce different medical triage recommendations for identical symptoms based solely on the language of the patient prompt, with ER recommendation rates ranging from 0% to 30% across six languages despite similar severity scores.

We investigate whether large language models produce different medical triage recommendations for identical symptoms based solely on the language of the patient prompt. Using Gemini 3.5 Flash, we evaluate a neurological symptom profile (persistent headache, blurred vision, nausea) across six languages (English, Spanish, Chinese, Hindi, Japanese, Arabic) with 30 runs per condition (n=450 total API calls). We find that the model recommends emergency room visits at rates ranging from 0% (Japanese, Hindi) to 30% (English, Arabic), despite assigning nearly identical severity scores (7.7-8.0/10) across all languages. Adding a single sentence specifying the patient's US location increases ER recommendations by up to 76.7 percentage points for non-English prompts, while the reverse anchor (English prompt with a Tokyo location) reduces the ER rate from 30% to 6.7%. A back-translation control (Japanese to English) produces ER rates comparable to the English baseline, confirming that the disparity is not caused by translation quality but by implicit geographic inference from the input language. We release the complete dataset, experiment code, and results.

View on arXiv PDF

Similar