IV AI CVJul 24, 2025

TCM-Tongue: A Standardized Tongue Image Dataset with Pathological Annotations for AI-Assisted TCM Diagnosis

Xuebo Jin, Longfei Gao, Anshuo Tong, Zhengyang Chen, Jianlei Kong, Ning Sun, Huijun Ma, Qiang Wang, Yuting Bai, Tingli Su

arXiv:2507.18288v15.11 citationsh-index: 6

Originality Synthesis-oriented

AI Analysis

This addresses the data shortage for AI development in TCM diagnosis, facilitating integration into research and clinical practice, but it is incremental as it focuses on dataset creation rather than novel methods.

The authors tackled the lack of standardized datasets for AI-assisted Traditional Chinese Medicine tongue diagnosis by creating a dataset of 6,719 high-quality images with 20 pathological symptom categories, averaging 2.54 clinically validated labels per image, and benchmarked it with nine deep learning models to demonstrate its utility.

Traditional Chinese medicine (TCM) tongue diagnosis, while clinically valuable, faces standardization challenges due to subjective interpretation and inconsistent imaging protocols, compounded by the lack of large-scale, annotated datasets for AI development. To address this gap, we present the first specialized dataset for AI-driven TCM tongue diagnosis, comprising 6,719 high-quality images captured under standardized conditions and annotated with 20 pathological symptom categories (averaging 2.54 clinically validated labels per image, all verified by licensed TCM practitioners). The dataset supports multiple annotation formats (COCO, TXT, XML) for broad usability and has been benchmarked using nine deep learning models (YOLOv5/v7/v8 variants, SSD, and MobileNetV2) to demonstrate its utility for AI development. This resource provides a critical foundation for advancing reliable computational tools in TCM, bridging the data shortage that has hindered progress in the field, and facilitating the integration of AI into both research and clinical practice through standardized, high-quality diagnostic data.

View on arXiv PDF

Similar