CLAISep 24, 2025

TianHui: A Domain-Specific Large Language Model for Diverse Traditional Chinese Medicine Scenarios

arXiv:2509.19834v2h-index: 2Has Code
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of constrained adaptability and insufficient evaluation for TCM researchers, representing a domain-specific advancement.

The study tackled the limitations of domain-specific LLMs in Traditional Chinese Medicine (TCM) by developing TianHui, a specialized model that achieved top-three rankings in all metrics for six datasets and top results in six others on 12 benchmarks.

Domain-specific LLMs in TCM face limitations in research settings due to constrained adaptability, insufficient evaluation datasets, and limited computational resources. This study presents TianHui, a specialized TCM LLM built through contextual data integration and domain knowledge fusion. We constructed a large-scale TCM corpus (0.97GB unsupervised data + 611,312 QA pairs) and employed a two-stage training strategy with QLoRA, DeepSpeed Stage 2, and Flash Attention 2. Evaluation on 12 benchmarks showed TianHui ranked top-three in all metrics for six datasets (APQ, TCMCD, HFR, HCCA, DHPE, TLAW) and achieved top results in the other six (TCMEE, APR, GCPMI, TCMKQA, TCMRC, ADTG). Optimal configuration was identified as LoRA rank=128, alpha=256, epoch=4, dropout=0.2, max length=2048. TianHui enables systematic preservation and scalable application of TCM knowledge. All resources are open-sourced.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes