Philipp Weiß

HC
4papers
253citations
Novelty20%
AI Score23

4 Papers

LGApr 25, 2022Code
Do Users Benefit From Interpretable Vision? A User Study, Baseline, And Dataset

Leon Sixt, Martin Schuessler, Oana-Iuliana Popescu et al.

A variety of methods exist to explain image classification models. However, whether they provide any benefit to users over simply comparing various inputs and the model's respective predictions remains unclear. We conducted a user study (N=240) to test how such a baseline explanation technique performs against concept-based and counterfactual explanations. To this end, we contribute a synthetic dataset generator capable of biasing individual attributes and quantifying their relevance to the model. In a study, we assess if participants can identify the relevant set of attributes compared to the ground-truth. Our results show that the baseline outperformed concept-based explanations. Counterfactual explanations from an invertible neural network performed similarly as the baseline. Still, they allowed users to identify some attributes more accurately. Our results highlight the importance of measuring how well users can reason about biases of a model, rather than solely relying on technical evaluations or proxy tasks. We open-source our study and dataset so it can serve as a blue-print for future studies. For code see, https://github.com/berleon/do_users_benefit_from_interpretable_vision

AIMay 6, 2021Code
Two4Two: Evaluating Interpretable Machine Learning - A Synthetic Dataset For Controlled Experiments

Martin Schuessler, Philipp Weiß, Leon Sixt

A growing number of approaches exist to generate explanations for image classification. However, few of these approaches are subjected to human-subject evaluations, partly because it is challenging to design controlled experiments with natural image datasets, as they leave essential factors out of the researcher's control. With our approach, researchers can describe their desired dataset with only a few parameters. Based on these, our library generates synthetic image data of two 3D abstract animals. The resulting data is suitable for algorithmic as well as human-subject evaluations. Our user study results demonstrate that our method can create biases predictive enough for a classifier and subtle enough to be noticeable only to every second participant inspecting the data visually. Our approach significantly lowers the barrier for conducting human subject evaluations, thereby facilitating more rigorous investigations into interpretable machine learning. For our library and datasets see, https://github.com/mschuessler/two4two/

HCFeb 3, 2020
Evaluating Saliency Map Explanations for Convolutional Neural Networks: A User Study

Ahmed Alqaraawi, Martin Schuessler, Philipp Weiß et al.

Convolutional neural networks (CNNs) offer great machine learning performance over a range of applications, but their operation is hard to interpret, even for experts. Various explanation algorithms have been proposed to address this issue, yet limited research effort has been reported concerning their user evaluation. In this paper, we report on an online between-group user study designed to evaluate the performance of "saliency maps" - a popular explanation algorithm for image classification applications of CNNs. Our results indicate that saliency maps produced by the LRP algorithm helped participants to learn about some specific image features the system is sensitive to. However, the maps seem to provide very limited help for participants to anticipate the network's output for new images. Drawing on our findings, we highlight implications for design and further research on explainable AI. In particular, we argue the HCI and AI communities should look beyond instance-level explanations.

HCMay 8, 2019
Minimalistic Explanations: Capturing the Essence of Decisions

Martin Schuessler, Philipp Weiß

The use of complex machine learning models can make systems opaque to users. Machine learning research proposes the use of post-hoc explanations. However, it is unclear if they give users insights into otherwise uninterpretable models. One minimalistic way of explaining image classifications by a deep neural network is to show only the areas that were decisive for the assignment of a label. In a pilot study, 20 participants looked at 14 of such explanations generated either by a human or the LIME algorithm. For explanations of correct decisions, they identified the explained object with significantly higher accuracy (75.64% vs. 18.52%). We argue that this shows that explanations can be very minimalistic while retaining the essence of a decision, but the decision-making contexts that can be conveyed in this manner is limited. Finally, we found that explanations are unique to the explainer and human-generated explanations were assigned 79% higher trust ratings. As a starting point for further studies, this work shares our first insights into quality criteria of post-hoc explanations.