CLJul 4, 2019

Morphological Word Embeddings

arXiv:1907.02423v1104 citations
Originality Incremental advance
AI Analysis

This work addresses the need for better morphological encoding in NLP for languages with rich morphology, but it is incremental as it builds on existing embedding methods.

The paper tackled the problem of capturing morphological similarity in word embeddings by extending the log-bilinear model with morphologically annotated data, achieving this goal as demonstrated in a German case study.

Linguistic similarity is multi-faceted. For instance, two words may be similar with respect to semantics, syntax, or morphology inter alia. Continuous word-embeddings have been shown to capture most of these shades of similarity to some degree. This work considers guiding word-embeddings with morphologically annotated data, a form of semi-supervised learning, encouraging the vectors to encode a word's morphology, i.e., words close in the embedded space share morphological features. We extend the log-bilinear model to this end and show that indeed our learned embeddings achieve this, using German as a case study.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes