Chunxu Luo

h-index7
2papers

2 Papers

97.6CYApr 9Code
MedThink: Enhancing Diagnostic Accuracy in Small Models via Teacher-Guided Reasoning Correction

Xinchun Su, Chunxu Luo, Lipeng Ma et al.

Accurate clinical diagnosis requires extensive domain knowledge and complex clinical reasoning capabilities. Although large language models (LLMs) hold great potential for clinical reasoning, their high computational and memory requirements limit their deployment in resource-constrained environments. Knowledge distillation (KD) can compress LLM capabilities into smaller models, but traditional KD merely transfers superficial answer patterns and fails to preserve the structured reasoning required for reliable diagnosis. To address this, we propose a two-stage distillation framework, MedThink, designed to cultivate robust clinical reasoning in small language models (SLMs). In the first stage, a teacher LLM screens data and injects domain-knowledge explanations to fine-tune a student model, establishing a knowledge foundation. In the second stage, the teacher evaluates the student's errors, generates reasoning chains linking knowledge to correct answers, and refines the student's diagnostic reasoning through a second round of fine-tuning. We evaluate MedThink on general medical benchmarks and a gastroenterology dataset comprising 955 question-answer pairs. Experiments demonstrate that MedThink outperforms six distillation strategies in all benchmarks: achieving an improvement of up to 12.7% over the student baseline in general tasks, and reaching a total top accuracy of 56.4% in gastroenterology evaluation. This indicates that iterative distillation centered on reasoning can significantly enhance the diagnostic accuracy and generalization capabilities of SLMs whilst maintaining computational efficiency. Our code and data are publicly available at https://github.com/destinybird/PrecisionBoost.

CLSep 27, 2025
MedCritical: Enhancing Medical Reasoning in Small Language Models via Self-Collaborative Correction

Xinchun Su, Chunxu Luo, Yixuan Li et al.

In the field of medicine, complex reasoning tasks such as clinical diagnosis, treatment planning, and medical knowledge integration pose significant challenges, where small language models often underperform compared to large language models like GPT-4 and Deepseek. Recent knowledge distillation-based methods aim to address these issues through teacher-guided error correction, but this LLM as judge approach remains challenging in terms of cost, time, and efficiency. To circumvent this issue, we propose a novel two-stage framework, MedCritical, which uses a small language model fine-tuned by a large teacher model to play against itself. In the first stage, we extract high-level and detailed long-chain thought templates from the teacher model to guide the student model to generate more complex reasoning thoughts. In the second stage, we introduce direct preference optimization (DPO) through model self-iteration collaboration to enhance the reasoning ability of the student model by playing against the correction trajectory of the fine-tuned model during training. This model self-learning DPO approach teaches the student model to use its own error-driven insights to consolidate its skills and knowledge to solve complex problems, and achieves comparable results to traditional knowledge distillation methods using teacher models at a lower cost. Notably, our MedCritical 7B model outperforms the Taiyi and Huatuo-o1-7B models by 3.04\% and 10.12\% respectively on the CMExam benchmark, achieving new SOTA performance among 7B-class small models.