CLLGNEAug 10, 2018

LemmaTag: Jointly Tagging and Lemmatizing for Morphologically-Rich Languages with BRNNs

arXiv:1808.03703v235 citations
Originality Incremental advance
AI Analysis

This addresses the problem of morphological analysis for languages with complex structures, offering a practical tool for NLP applications in those languages.

The paper tackles joint part-of-speech tagging and lemmatization for morphologically-rich languages using a neural network, achieving state-of-the-art accuracy in Czech, German, and Arabic.

We present LemmaTag, a featureless neural network architecture that jointly generates part-of-speech tags and lemmas for sentences by using bidirectional RNNs with character-level and word-level embeddings. We demonstrate that both tasks benefit from sharing the encoding part of the network, predicting tag subcategories, and using the tagger output as an input to the lemmatizer. We evaluate our model across several languages with complex morphology, which surpasses state-of-the-art accuracy in both part-of-speech tagging and lemmatization in Czech, German, and Arabic.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes