CLJun 24, 2016

Evaluation method of word embedding by roots and affixes

arXiv:1606.07601v1

Originality Incremental advance

AI Analysis

This work addresses interpretability issues in word embeddings for NLP researchers, but it is incremental as it builds on existing evaluation methods.

The paper tackles the problem of interpreting word embedding dimensions by proposing a roots and affixes model (RAAM) that uses information entropy to categorize dimensions, showing a negative linear relation between attributes and a high positive correlation with downstream semantic tasks.

Word embedding has been shown to be remarkably effective in a lot of Natural Language Processing tasks. However, existing models still have a couple of limitations in interpreting the dimensions of word vector. In this paper, we provide a new approach---roots and affixes model(RAAM)---to interpret it from the intrinsic structures of natural language. Also it can be used as an evaluation measure of the quality of word embedding. We introduce the information entropy into our model and divide the dimensions into two categories, just like roots and affixes in lexical semantics. Then considering each category as a whole rather than individually. We experimented with English Wikipedia corpus. Our result show that there is a negative linear relation between the two attributes and a high positive correlation between our model and downstream semantic evaluation tasks.

View on arXiv PDF

Similar