QM CV IR MEJun 26, 2024

Concordance in basal cell carcinoma diagnosis. Building a proper ground truth to train Artificial Intelligence tools

Francisca Silva-Clavería, Carmen Serrano, Iván Matas, Amalia Serrano, Tomás Toledo-Pastrana, Begoña Acha

arXiv:2406.18240v11 citations

Originality Synthesis-oriented

AI Analysis

This addresses the need for reliable AI training in medical diagnostics, but it is incremental as it focuses on improving ground truth methods for a specific domain.

The study tackled the problem of inconsistent ground truth in basal cell carcinoma (BCC) diagnosis by analyzing consensus among dermatologists on dermoscopic criteria and found that AI tool performance differed significantly when trained on a single dermatologist's ground truth versus a statistically inferred consensus from four dermatologists.

Background: The existence of different basal cell carcinoma (BCC) clinical criteria cannot be objectively validated. An adequate ground-truth is needed to train an artificial intelligence (AI) tool that explains the BCC diagnosis by providing its dermoscopic features. Objectives: To determine the consensus among dermatologists on dermoscopic criteria of 204 BCC. To analyze the performance of an AI tool when the ground-truth is inferred. Methods: A single center, diagnostic and prospective study was conducted to analyze the agreement in dermoscopic criteria by four dermatologists and then derive a reference standard. 1434 dermoscopic images have been used, that were taken by a primary health physician, sent via teledermatology, and diagnosed by a dermatologist. They were randomly selected from the teledermatology platform (2019-2021). 204 of them were tested with an AI tool; the remainder trained it. The performance of the AI tool trained using the ground-truth of one dermatologist versus the ground-truth statistically inferred from the consensus of four dermatologists was analyzed using McNemar's test and Hamming distance. Results: Dermatologists achieve perfect agreement in the diagnosis of BCC (Fleiss-Kappa=0.9079), and a high correlation with the biopsy (PPV=0.9670). However, there is low agreement in detecting some dermoscopic criteria. Statistical differences were found in the performance of the AI tool trained using the ground-truth of one dermatologist versus the ground-truth statistically inferred from the consensus of four dermatologists. Conclusions: Care should be taken when training an AI tool to determine the BCC patterns present in a lesion. Ground-truth should be established from multiple dermatologists.

View on arXiv PDF

Similar