CLJul 4, 2019

Morphological Word Embeddings

arXiv:1907.02423v1104 citations

Originality Incremental advance

AI Analysis

This work addresses the need for better morphological encoding in NLP for languages with rich morphology, but it is incremental as it builds on existing embedding methods.

The paper tackled the problem of capturing morphological similarity in word embeddings by extending the log-bilinear model with morphologically annotated data, achieving this goal as demonstrated in a German case study.

Linguistic similarity is multi-faceted. For instance, two words may be similar with respect to semantics, syntax, or morphology inter alia. Continuous word-embeddings have been shown to capture most of these shades of similarity to some degree. This work considers guiding word-embeddings with morphologically annotated data, a form of semi-supervised learning, encouraging the vectors to encode a word's morphology, i.e., words close in the embedded space share morphological features. We extend the log-bilinear model to this end and show that indeed our learned embeddings achieve this, using German as a case study.

View on arXiv PDF

Similar