CVDec 2, 2022
Evaluation of Explanation Methods of AI -- CNNs in Image Classification Tasks with Reference-based and No-reference MetricsA. Zhukov, J. Benois-Pineau, R. Giot
The most popular methods in AI-machine learning paradigm are mainly black boxes. This is why explanation of AI decisions is of emergency. Although dedicated explanation tools have been massively developed, the evaluation of their quality remains an open research question. In this paper, we generalize the methodologies of evaluation of post-hoc explainers of CNNs' decisions in visual classification tasks with reference and no-reference based metrics. We apply them on our previously developed explainers (FEM, MLFEM), and popular Grad-CAM. The reference-based metrics are Pearson correlation coefficient and Similarity computed between the explanation map and its ground truth represented by a Gaze Fixation Density Map obtained with a psycho-visual experiment. As a no-reference metric, we use stability metric, proposed by Alvarez-Melis and Jaakkola. We study its behaviour, consensus with reference-based metrics and show that in case of several kinds of degradation on input images, this metric is in agreement with reference-based ones. Therefore, it can be used for evaluation of the quality of explainers when the ground truth is not available.
HCOct 21, 2019
Toward automatic comparison of visualization techniques: Application to graph visualizationL. Giovannangeli, R. Bourqui, R. Giot et al.
Many end-user evaluations of data visualization techniques have been run during the last decades. Their results are cornerstones to build efficient visualization systems. However, designing such an evaluation is always complex and time-consuming and may end in a lack of statistical evidence and reproducibility. We believe that modern and efficient computer vision techniques, such as deep convolutional neural networks (CNNs), may help visualization researchers to build and/or adjust their evaluation hypothesis. The basis of our idea is to train machine learning models on several visualization techniques to solve a specific task. Our assumption is that it is possible to compare the efficiency of visualization techniques based on the performance of their corresponding model. As current machine learning models are not able to strictly reflect human capabilities, including their imperfections, such results should be interpreted with caution. However, we think that using machine learning-based pre-evaluation, as a pre-process of standard user evaluations, should help researchers to perform a more exhaustive study of their design space. Thus, it should improve their final user evaluation by providing it better test cases. In this paper, we present the results of two experiments we have conducted to assess how correlated the performance of users and computer vision techniques can be. That study compares two mainstream graph visualization techniques: node-link (\NL) and adjacency-matrix (\MD) diagrams. Using two well-known deep convolutional neural networks, we partially reproduced user evaluations from Ghoniem \textit{et al.} and from Okoe \textit{et al.}. These experiments showed that some user evaluation results can be reproduced automatically.