CLMay 21, 2018

Morphological analysis using a sequence decoder

arXiv:1805.07946v21001 citationsHas Code
Originality Incremental advance
AI Analysis

This addresses morphological analysis for natural language processing, particularly benefiting low-resource and morphologically complex languages, though it is incremental as it builds on existing encoder-decoder methods.

The authors tackled morphological analysis by introducing Morse, a recurrent encoder-decoder model that generates morphological features as sequences, enabling handling of rare tags and complex languages, achieving state-of-the-art results in nine languages under various settings.

We introduce Morse, a recurrent encoder-decoder model that produces morphological analyses of each word in a sentence. The encoder turns the relevant information about the word and its context into a fixed size vector representation and the decoder generates the sequence of characters for the lemma followed by a sequence of individual morphological features. We show that generating morphological features individually rather than as a combined tag allows the model to handle rare or unseen tags and outperform whole-tag models. In addition, generating morphological features as a sequence rather than e.g.\ an unordered set allows our model to produce an arbitrary number of features that represent multiple inflectional groups in morphologically complex languages. We obtain state-of-the art results in nine languages of different morphological complexity under low-resource, high-resource and transfer learning settings. We also introduce TrMor2018, a new high accuracy Turkish morphology dataset. Our Morse implementation and the TrMor2018 dataset are available online to support future research\footnote{See \url{https://github.com/ai-ku/Morse.jl} for a Morse implementation in Julia/Knet \cite{knet2016mlsys} and \url{https://github.com/ai-ku/TrMor2018} for the new Turkish dataset.}.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes