Paul Conway

SDJan 26

A Framework for Evaluating Faithfulness in Explainable AI for Machine Anomalous Sound Detection Using Frequency-Band Perturbation

Alexander Buck, Georgina Cosma, Iain Phillips et al.

Explainable AI (XAI) is commonly applied to anomalous sound detection (ASD) models to identify which time-frequency regions of an audio signal contribute to an anomaly decision. However, most audio explanations rely on qualitative inspection of saliency maps, leaving open the question of whether these attributions accurately reflect the spectral cues the model uses. In this work, we introduce a new quantitative framework for evaluating XAI faithfulness in machine-sound analysis by directly linking attribution relevance to model behaviour through systematic frequency-band removal. This approach provides an objective measure of whether an XAI method for machine ASD correctly identifies frequency regions that influence an ASD model's predictions. By using four widely adopted methods, namely Integrated Gradients, Occlusion, Grad-CAM and SmoothGrad, we show that XAI techniques differ in reliability, with Occlusion demonstrating the strongest alignment with true model sensitivity and gradient-+based methods often failing to accurately capture spectral dependencies. The proposed framework offers a reproducible way to benchmark audio explanations and enables more trustworthy interpretation of spectrogram-based ASD systems.

HCMay 3, 2023

Judgments of research co-created by generative AI: experimental evidence

Paweł Niszczota, Paul Conway

The introduction of ChatGPT has fuelled a public debate on the use of generative AI (large language models; LLMs), including its use by researchers. In the current work, we test whether delegating parts of the research process to LLMs leads people to distrust and devalue researchers and scientific output. Participants (N=402) considered a researcher who delegates elements of the research process to a PhD student or LLM, and rated (1) moral acceptability, (2) trust in the scientist to oversee future projects, and (3) the accuracy and quality of the output. People judged delegating to an LLM as less acceptable than delegating to a human (d = -0.78). Delegation to an LLM also decreased trust to oversee future research projects (d = -0.80), and people thought the results would be less accurate and of lower quality (d = -0.85). We discuss how this devaluation might transfer into the underreporting of generative AI use.

Paul Conway

2 Papers