CL LG MMMar 18

Dental-TriageBench: Benchmarking Multimodal Reasoning for Hierarchical Dental Triage

Ziyi He, Yushi Feng, Shuangyu Yang, Yinghao Zhu, Xichen Zhang, Pak Chuen Patrick Tai, Hei Yuet Lo, Songying Wu, Weifa Yang, Lequan Yu

arXiv:2604.1306083.1h-index: 4Has Code

AI Analysis

This work addresses the safety-critical need for better multimodal AI in dental clinical routing, though it is incremental as it introduces a new benchmark without a novel method.

The authors tackled the problem of dental triage by creating Dental-TriageBench, the first expert-annotated benchmark for multimodal reasoning in this domain, and found a substantial gap between 19 tested MLLMs and human baselines, with models performing poorly on fine-grained treatment-level triage.

Dental triage is a safety-critical clinical routing task that requires integrating multimodal clinical information (e.g., patient complaints and radiographic evidence) to determine complete referral plans. We present Dental-TriageBench, the first expert-annotated benchmark for reasoning-driven multimodal dental triage. Built from authentic outpatient workflows, it contains 246 de-identified cases annotated with expert-authored golden reasoning trajectories, together with hierarchical triage labels. We benchmark 19 proprietary, open-source, and medical-domain MLLMs against three junior dentists serving as the human baseline, and find a substantial human--model gap, on fine-grained treatment-level triage. Further analyses show that accurate triage requires both complaint and OPG information, and that model errors concentrate on cases with multiple referral domains, where MLLMs tend to produce overly narrow referral sets and omission-heavy errors. Dental-TriageBench provides a realistic testbed for developing multimodal clinical AI systems that are more clinically grounded, coverage-aware, and safer for downstream care.

View on arXiv PDF

Similar