CVAug 6, 2025

ACM Multimedia Grand Challenge on ENT Endoscopy Analysis

Trong-Thuan Nguyen, Viet-Tham Huynh, Thao Thi Phuong Dao, Ha Nguyen Thi, Tien To Vu Thuy, Uyen Hanh Tran, Tam V. Nguyen, Thanh Dinh Le, Minh-Triet Tran

arXiv:2508.04801v13 citationsh-index: 11MM

Originality Synthesis-oriented

AI Analysis

This work tackles the problem of automated analysis in ENT care for clinicians, but it is incremental as it primarily creates a new benchmark rather than advancing methods.

The paper introduces ENTRep, a grand challenge dataset for ENT endoscopy analysis, addressing the lack of public benchmarks by integrating fine-grained anatomical classification with image-to-image and text-to-image retrieval under bilingual clinical supervision, and reports results from top-performing teams.

Automated analysis of endoscopic imagery is a critical yet underdeveloped component of ENT (ear, nose, and throat) care, hindered by variability in devices and operators, subtle and localized findings, and fine-grained distinctions such as laterality and vocal-fold state. In addition to classification, clinicians require reliable retrieval of similar cases, both visually and through concise textual descriptions. These capabilities are rarely supported by existing public benchmarks. To this end, we introduce ENTRep, the ACM Multimedia 2025 Grand Challenge on ENT endoscopy analysis, which integrates fine-grained anatomical classification with image-to-image and text-to-image retrieval under bilingual (Vietnamese and English) clinical supervision. Specifically, the dataset comprises expert-annotated images, labeled for anatomical region and normal or abnormal status, and accompanied by dual-language narrative descriptions. In addition, we define three benchmark tasks, standardize the submission protocol, and evaluate performance on public and private test splits using server-side scoring. Moreover, we report results from the top-performing teams and provide an insight discussion.

View on arXiv PDF

Similar