CLSep 7, 2018

Using Sparse Semantic Embeddings Learned from Multimodal Text and Image Data to Model Human Conceptual Knowledge

Steven Derby, Paul Miller, Brian Murphy, Barry Devereux

arXiv:1809.02534v332.01091 citations

Originality Incremental advance

AI Analysis

This work addresses the need for more human-like semantic models in AI and cognitive science, though it appears incremental as it builds on existing distributional methods with multimodal data.

The paper tackled the problem of dense semantic embeddings not resembling human knowledge by combining multimodal text and image data to produce sparse, interpretable vectors, demonstrating their ability to predict human semantic knowledge through behavioral and neuroimaging comparisons.

Distributional models provide a convenient way to model semantics using dense embedding spaces derived from unsupervised learning algorithms. However, the dimensions of dense embedding spaces are not designed to resemble human semantic knowledge. Moreover, embeddings are often built from a single source of information (typically text data), even though neurocognitive research suggests that semantics is deeply linked to both language and perception. In this paper, we combine multimodal information from both text and image-based representations derived from state-of-the-art distributional models to produce sparse, interpretable vectors using Joint Non-Negative Sparse Embedding. Through in-depth analyses comparing these sparse models to human-derived behavioural and neuroimaging data, we demonstrate their ability to predict interpretable linguistic descriptions of human ground-truth semantic knowledge.

View on arXiv PDF

Similar