Improving neural network representations using human similarity judgments
This work addresses the issue of suboptimal global organization in neural network representations for computer vision, offering a method to enhance performance on downstream tasks like few-shot learning.
The authors tackled the problem of neural network representations lacking global structure by aligning them with human similarity judgments, resulting in improved accuracy across few-shot learning and anomaly detection tasks.
Deep neural networks have reached human-level performance on many computer vision tasks. However, the objectives used to train these networks enforce only that similar images are embedded at similar locations in the representation space, and do not directly constrain the global structure of the resulting space. Here, we explore the impact of supervising this global structure by linearly aligning it with human similarity judgments. We find that a naive approach leads to large changes in local representational structure that harm downstream performance. Thus, we propose a novel method that aligns the global structure of representations while preserving their local structure. This global-local transform considerably improves accuracy across a variety of few-shot learning and anomaly detection tasks. Our results indicate that human visual representations are globally organized in a way that facilitates learning from few examples, and incorporating this global structure into neural network representations improves performance on downstream tasks.