On the reliability of feature attribution methods for speech classification
This addresses the problem of interpreting complex models for researchers and practitioners in speech processing, but it is incremental as it focuses on evaluating existing methods.
The study investigated the reliability of standard feature attribution methods in speech classification, finding that most are unreliable except for word-aligned perturbation methods in word-based tasks.
As the capabilities of large-scale pre-trained models evolve, understanding the determinants of their outputs becomes more important. Feature attribution aims to reveal which parts of the input elements contribute the most to model outputs. In speech processing, the unique characteristics of the input signal make the application of feature attribution methods challenging. We study how factors such as input type and aggregation and perturbation timespan impact the reliability of standard feature attribution methods, and how these factors interact with characteristics of each classification task. We find that standard approaches to feature attribution are generally unreliable when applied to the speech domain, with the exception of word-aligned perturbation methods when applied to word-based classification tasks.