Extracting low-dimensional psychological representations from convolutional neural networks
This work addresses the interpretability challenge in cognitive modeling for researchers using deep learning, though it is incremental as it builds on existing predictive methods.
The authors tackled the problem of interpreting high-dimensional convolutional neural network representations used for predicting human similarity judgments by developing a method to reduce them to a low-dimensional space that remains predictive and provides explanatory insights.
Deep neural networks are increasingly being used in cognitive modeling as a means of deriving representations for complex stimuli such as images. While the predictive power of these networks is high, it is often not clear whether they also offer useful explanations of the task at hand. Convolutional neural network representations have been shown to be predictive of human similarity judgments for images after appropriate adaptation. However, these high-dimensional representations are difficult to interpret. Here we present a method for reducing these representations to a low-dimensional space which is still predictive of similarity judgments. We show that these low-dimensional representations also provide insightful explanations of factors underlying human similarity judgments.