CLNov 14, 2017

From Word Segmentation to POS Tagging for Vietnamese

arXiv:1711.04951v11096 citationsHas Code
Originality Synthesis-oriented
AI Analysis

This work addresses part-of-speech tagging for Vietnamese, an incremental improvement in natural language processing for a specific language.

The paper tackled Vietnamese POS tagging from unsegmented text by comparing pipeline and joint strategies, finding that the pipeline approach with a feature-based model achieved the highest accuracy on a benchmark dataset.

This paper presents an empirical comparison of two strategies for Vietnamese Part-of-Speech (POS) tagging from unsegmented text: (i) a pipeline strategy where we consider the output of a word segmenter as the input of a POS tagger, and (ii) a joint strategy where we predict a combined segmentation and POS tag for each syllable. We also make a comparison between state-of-the-art (SOTA) feature-based and neural network-based models. On the benchmark Vietnamese treebank (Nguyen et al., 2009), experimental results show that the pipeline strategy produces better scores of POS tagging from unsegmented text than the joint strategy, and the highest accuracy is obtained by using a feature-based model.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes