Baichuan-M3: Modeling Clinical Inquiry for Reliable Medical Decision-Making
This addresses the need for reliable, clinical-grade decision support in medical consultations, representing a paradigm shift rather than an incremental improvement.
The paper tackles the problem of passive question-answering in medical AI by introducing Baichuan-M3, a model that actively supports clinical decision-making through proactive information acquisition, long-horizon reasoning, and hallucination suppression, achieving state-of-the-art results on benchmarks like HealthBench and significantly outperforming GPT-5.2.
We introduce Baichuan-M3, a medical-enhanced large language model engineered to shift the paradigm from passive question-answering to active, clinical-grade decision support. Addressing the limitations of existing systems in open-ended consultations, Baichuan-M3 utilizes a specialized training pipeline to model the systematic workflow of a physician. Key capabilities include: (i) proactive information acquisition to resolve ambiguity; (ii) long-horizon reasoning that unifies scattered evidence into coherent diagnoses; and (iii) adaptive hallucination suppression to ensure factual reliability. Empirical evaluations demonstrate that Baichuan-M3 achieves state-of-the-art results on HealthBench, the newly introduced HealthBench-Hallu and ScanBench, significantly outperforming GPT-5.2 in clinical inquiry, advisory and safety. The models are publicly available at https://huggingface.co/collections/baichuan-inc/baichuan-m3.