CLFeb 13, 2025

Improving TCM Question Answering through Tree-Organized Self-Reflective Retrieval with LLMs

arXiv:2502.09156v13 citationsh-index: 4
Originality Incremental advance
AI Analysis

This work addresses a domain-specific bottleneck in medical AI for Traditional Chinese Medicine practitioners and learners, offering incremental improvements in retrieval efficiency.

The paper tackled the problem of inefficient retrieval-augmented generation in Traditional Chinese Medicine question answering by proposing a Tree-Organized Self-Reflective Retrieval framework, which improved accuracy by 19.85% on a licensing exam benchmark and recall from 27% to 38% on a college exam dataset.

Objectives: Large language models (LLMs) can harness medical knowledge for intelligent question answering (Q&A), promising support for auxiliary diagnosis and medical talent cultivation. However, there is a deficiency of highly efficient retrieval-augmented generation (RAG) frameworks within the domain of Traditional Chinese Medicine (TCM). Our purpose is to observe the effect of the Tree-Organized Self-Reflective Retrieval (TOSRR) framework on LLMs in TCM Q&A tasks. Materials and Methods: We introduce the novel approach of knowledge organization, constructing a tree structure knowledge base with hierarchy. At inference time, our self-reflection framework retrieves from this knowledge base, integrating information across chapters. Questions from the TCM Medical Licensing Examination (MLE) and the college Classics Course Exam (CCE) were randomly selected as benchmark datasets. Results: By coupling with GPT-4, the framework can improve the best performance on the TCM MLE benchmark by 19.85% in absolute accuracy, and improve recall accuracy from 27% to 38% on CCE datasets. In manual evaluation, the framework improves a total of 18.52 points across dimensions of safety, consistency, explainability, compliance, and coherence. Conclusion: The TOSRR framework can effectively improve LLM's capability in Q&A tasks of TCM.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes