Actively learning a Bayesian matrix fusion model with deep side information
This work addresses the challenge of efficiently aligning high-dimensional deep features with human responses, which is incremental as it builds on existing Bayesian matrix factorization methods by incorporating active learning for improved data collection.
The paper tackles the problem of costly data collection for aligning deep neural network representations with human annotations by proposing an active learning approach for adaptively sampling stimuli to learn a Bayesian matrix factorization model with deep side information. The result is a significant efficiency gain over passive baselines, applicable to both small lab datasets and large-scale crowdsourced settings.
High-dimensional deep neural network representations of images and concepts can be aligned to predict human annotations of diverse stimuli. However, such alignment requires the costly collection of behavioral responses, such that, in practice, the deep-feature spaces are only ever sparsely sampled. Here, we propose an active learning approach to adaptively sampling experimental stimuli to efficiently learn a Bayesian matrix factorization model with deep side information. We observe a significant efficiency gain over a passive baseline. Furthermore, with a sequential batched sampling strategy, the algorithm is applicable not only to small datasets collected from traditional laboratory experiments but also to settings where large-scale crowdsourced data collection is needed to accurately align the high-dimensional deep feature representations derived from pre-trained networks.