The Clever Hans Effect in Unsupervised Learning
This work addresses risks for AI practitioners using unsupervised learning in applications like foundation models, though it is incremental in highlighting an existing issue with new evidence.
The paper tackled the problem of the Clever Hans effect, where unsupervised learning models make accurate predictions for incorrect reasons, and found through empirical and theoretical analysis that this effect is widespread, with inductive biases identified as a primary source.
Unsupervised learning has become an essential building block of AI systems. The representations it produces, e.g. in foundation models, are critical to a wide variety of downstream applications. It is therefore important to carefully examine unsupervised models to ensure not only that they produce accurate predictions, but also that these predictions are not "right for the wrong reasons", the so-called Clever Hans (CH) effect. Using specially developed Explainable AI techniques, we show for the first time that CH effects are widespread in unsupervised learning. Our empirical findings are enriched by theoretical insights, which interestingly point to inductive biases in the unsupervised learning machine as a primary source of CH effects. Overall, our work sheds light on unexplored risks associated with practical applications of unsupervised learning and suggests ways to make unsupervised learning more robust.