CL IR LGOct 15, 2019

Context Matters: Recovering Human Semantic Structure from Machine Learning Analysis of Large-Scale Text Corpora

Marius Cătălin Iordan, Tyler Giallanza, Cameron T. Ellis, Nicole M. Beckage, Jonathan D. Cohen

arXiv:1910.06954v30.315 citations

Originality Incremental advance

AI Analysis

This work addresses the problem of aligning computational models with human semantic understanding for researchers in psychology and AI, though it is incremental in refining existing methods.

The paper tackled the discrepancy between machine learning predictions and human judgments of semantic similarity by introducing a contextually-constrained embedding approach, which improved predictions of similarity judgments and feature ratings.

Applying machine learning algorithms to large-scale, text-based corpora (embeddings) presents a unique opportunity to investigate at scale how human semantic knowledge is organized and how people use it to judge fundamental relationships, such as similarity between concepts. However, efforts to date have shown a substantial discrepancy between algorithm predictions and empirical judgments. Here, we introduce a novel approach of generating embeddings motivated by the psychological theory that semantic context plays a critical role in human judgments. Specifically, we train state-of-the-art machine learning algorithms using contextually-constrained text corpora and show that this greatly improves predictions of similarity judgments and feature ratings. By improving the correspondence between representations derived using embeddings generated by machine learning methods and empirical measurements of human judgments, the approach we describe helps advance the use of large-scale text corpora to understand the structure of human semantic representations.

View on arXiv PDF

Similar