Evaluating Word Embeddings in Multi-label Classification Using Fine-grained Name Typing
This work addresses the need for more precise evaluation of word embeddings in natural language processing, particularly for researchers and practitioners dealing with large-scale knowledge bases, though it is incremental as it builds on existing multi-label classification approaches.
The authors tackled the problem of evaluating word embeddings by proposing a new method based on multi-label classification for fine-grained name typing, which involves identifying all types a name can refer to using its embedding, and they built large, fine-grained datasets to directly assess embedding properties without confounding factors like sentence context.
Embedding models typically associate each word with a single real-valued vector, representing its different properties. Evaluation methods, therefore, need to analyze the accuracy and completeness of these properties in embeddings. This requires fine-grained analysis of embedding subspaces. Multi-label classification is an appropriate way to do so. We propose a new evaluation method for word embeddings based on multi-label classification given a word embedding. The task we use is fine-grained name typing: given a large corpus, find all types that a name can refer to based on the name embedding. Given the scale of entities in knowledge bases, we can build datasets for this task that are complementary to the current embedding evaluation datasets in: they are very large, contain fine-grained classes, and allow the direct evaluation of embeddings without confounding factors like sentence context