IVAICVJul 24, 2025

TCM-Tongue: A Standardized Tongue Image Dataset with Pathological Annotations for AI-Assisted TCM Diagnosis

arXiv:2507.18288v11 citationsh-index: 6
Originality Synthesis-oriented
AI Analysis

This addresses the data shortage for AI development in TCM diagnosis, facilitating integration into research and clinical practice, but it is incremental as it focuses on dataset creation rather than novel methods.

The authors tackled the lack of standardized datasets for AI-assisted Traditional Chinese Medicine tongue diagnosis by creating a dataset of 6,719 high-quality images with 20 pathological symptom categories, averaging 2.54 clinically validated labels per image, and benchmarked it with nine deep learning models to demonstrate its utility.

Traditional Chinese medicine (TCM) tongue diagnosis, while clinically valuable, faces standardization challenges due to subjective interpretation and inconsistent imaging protocols, compounded by the lack of large-scale, annotated datasets for AI development. To address this gap, we present the first specialized dataset for AI-driven TCM tongue diagnosis, comprising 6,719 high-quality images captured under standardized conditions and annotated with 20 pathological symptom categories (averaging 2.54 clinically validated labels per image, all verified by licensed TCM practitioners). The dataset supports multiple annotation formats (COCO, TXT, XML) for broad usability and has been benchmarked using nine deep learning models (YOLOv5/v7/v8 variants, SSD, and MobileNetV2) to demonstrate its utility for AI development. This resource provides a critical foundation for advancing reliable computational tools in TCM, bridging the data shortage that has hindered progress in the field, and facilitating the integration of AI into both research and clinical practice through standardized, high-quality diagnostic data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes