Evaluating saliency methods on artificial data with different background types
This addresses the need for better evaluation of xAI methods before clinical or safety-critical applications, though it is incremental as it builds on existing evaluation concerns.
The researchers tackled the problem of objectively evaluating saliency methods in explainable AI by developing a framework using artificial data with synthetic lesions and known ground truth maps. They found that heatmaps vary strongly between different saliency methods and background types (Perlin noise and 2D brain MRI slices).
Over the last years, many 'explainable artificial intelligence' (xAI) approaches have been developed, but these have not always been objectively evaluated. To evaluate the quality of heatmaps generated by various saliency methods, we developed a framework to generate artificial data with synthetic lesions and a known ground truth map. Using this framework, we evaluated two data sets with different backgrounds, Perlin noise and 2D brain MRI slices, and found that the heatmaps vary strongly between saliency methods and backgrounds. We strongly encourage further evaluation of saliency maps and xAI methods using this framework before applying these in clinical or other safety-critical settings.