Ranking XAI Methods for Head and Neck Cancer Outcome Prediction

Baoqiang Ma, Djennifer K. Madzia-Madzou, Rosa C. J. Kraaijveld, Jin Ouyang

arXiv:2604.160349.0h-index: 3

AI Analysis

For clinicians and researchers in medical imaging, this work provides a systematic framework to select appropriate XAI methods, addressing a critical obstacle to clinical adoption of AI.

This study comprehensively evaluates and ranks 13 XAI methods across 24 metrics for head and neck cancer outcome prediction, finding that Integrated Gradients and DeepLIFT consistently achieve high rankings for faithfulness, complexity, and plausibility.

For head and neck cancer (HNC) patients, prognostic outcome prediction can support personalized treatment strategy selection. Improving prediction performance of HNC outcomes has been extensively explored by using advanced artificial intelligence (AI) techniques on PET/CT data. However, the interpretability of AI remains a critical obstacle for its clinical adoption. Unlike previous HNC studies that empirically selected explainable AI (XAI) techniques, we are the first to comprehensively evaluate and rank 13 XAI methods across 24 metrics, covering faithfulness, robustness, complexity and plausibility. Experimental results on the multi-center HECKTOR challenge dataset show large variations across evaluation aspects among different XAI methods, with Integrated Gradients (IG) and DeepLIFT (DL) consistently obtained high rankings for faithfulness, complexity and plausibility. This work highlights the importance of comprehensive XAI method evaluation and can be extended to other medical imaging tasks.

View on arXiv PDF

Similar