Unmasking Clever Hans Predictors and Assessing What Machines Really Learn
This work addresses the issue of interpretability and reliability in machine learning for researchers and practitioners, offering a method to assess whether models solve problems as intended, though it is incremental in building on existing explanation techniques.
The paper tackles the problem of understanding what machine learning models actually learn, revealing that high-accuracy models can exhibit naive or short-sighted behaviors not captured by standard metrics. It proposes Spectral Relevance Analysis to characterize and validate model behavior, aiming to provide a more nuanced evaluation of machine intelligence successes.
Current learning machines have successfully solved hard application problems, reaching high accuracy and displaying seemingly "intelligent" behavior. Here we apply recent techniques for explaining decisions of state-of-the-art learning machines and analyze various tasks from computer vision and arcade games. This showcases a spectrum of problem-solving behaviors ranging from naive and short-sighted, to well-informed and strategic. We observe that standard performance evaluation metrics can be oblivious to distinguishing these diverse problem solving behaviors. Furthermore, we propose our semi-automated Spectral Relevance Analysis that provides a practically effective way of characterizing and validating the behavior of nonlinear learning machines. This helps to assess whether a learned model indeed delivers reliably for the problem that it was conceived for. Furthermore, our work intends to add a voice of caution to the ongoing excitement about machine intelligence and pledges to evaluate and judge some of these recent successes in a more nuanced manner.