CLLGMar 28

Development and Preliminary Evaluation of a Domain-Specific Large Language Model for Tuberculosis Care in South Africa

arXiv:2604.1977669.3
Predicted impact top 91% in CL · last 90 daysOriginality Synthesis-oriented
AI Analysis

This work addresses the burden of tuberculosis on patients and healthcare providers in South Africa, but it is incremental as it builds on existing medical LLMs with fine-tuning.

The authors developed a domain-specific large language model for tuberculosis care in South Africa, which showed improved performance over a base medical model in contextual alignment metrics.

Tuberculosis (TB) is one of the world's deadliest infectious diseases, and in South Africa, it contributes a significant burden to the country's health care system. This paper presents an experimental study on the development of a domain-specific Large Language Model (DS-LLM) for TB care that can help to alleviate the burden on patients and healthcare providers. To achieve this, a literature review was conducted to understand current LLM development strategies, specifically in the medical domain. Thereafter, data were collected from South African TB guidelines, selected TB literature, and existing benchmark medical datasets. We performed LLM fine-tuning by using the Quantised Low-Rank Adaptation (QLoRA) algorithm on a medical LLM (BioMistral-7B), and also implemented Retrieval-Augmented Generation using GraphRAG. The developed DS-LLM was evaluated against the base BioMistral-7B model and a general-purpose LLM using a mix of automated metrics and quantitative ratings. The results show that the DS-LLM had better performance compared to the base model in terms of its contextual alignment (lexical, semantic, and knowledge) for TB care in South Africa.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes