CLSep 5, 2018

Firearms and Tigers are Dangerous, Kitchen Knives and Zebras are Not: Testing whether Word Embeddings Can Tell

arXiv:1809.01375v132.01097 citations

Originality Synthesis-oriented

AI Analysis

This work addresses a fundamental question in natural language processing about the limitations of word embeddings for researchers and practitioners, though it is incremental as it builds on existing datasets and methods.

The paper tackled the problem of determining which semantic properties are captured by word embeddings by proposing a method that tests supervised classifiers and vector similarity on a dataset extended with negative examples. The results indicated that embeddings capture interaction-relevant properties like 'dangerous' but not perceptual ones like colors, providing an initial validation of the method.

This paper presents an approach for investigating the nature of semantic information captured by word embeddings. We propose a method that extends an existing human-elicited semantic property dataset with gold negative examples using crowd judgments. Our experimental approach tests the ability of supervised classifiers to identify semantic features in word embedding vectors and com- pares this to a feature-identification method based on full vector cosine similarity. The idea behind this method is that properties identified by classifiers, but not through full vector comparison are captured by embeddings. Properties that cannot be identified by either method are not. Our results provide an initial indication that semantic properties relevant for the way entities interact (e.g. dangerous) are captured, while perceptual information (e.g. colors) is not represented. We conclude that, though preliminary, these results show that our method is suitable for identifying which properties are captured by embeddings.

View on arXiv PDF

Similar