CLJun 14, 2022

An Experimental Investigation of Part-Of-Speech Taggers for Vietnamese

arXiv:2206.06992v12 citationsh-index: 11
Originality Incremental advance
AI Analysis

This work addresses part-of-speech tagging for Vietnamese, an incremental improvement for NLP applications in that language.

The paper tackled the problem of part-of-speech tagging for Vietnamese by developing two new taggers using ClearNLP and Stanford Tagger with a new feature set, achieving the best tagging accuracy compared to existing Vietnamese taggers, and found that RDRPOSTagger runs significantly faster than statistical taggers.

Part-of-speech (POS) tagging plays an important role in Natural Language Processing (NLP). Its applications can be found in many NLP tasks such as named entity recognition, syntactic parsing, dependency parsing and text chunking. In the investigation conducted in this paper, we utilize the technologies of two widely-used toolkits, ClearNLP and Stanford POS Tagger, as well as develop two new POS taggers for Vietnamese, then compare them to three well-known Vietnamese taggers, namely JVnTagger, vnTagger and RDRPOSTagger. We make a systematic comparison to find out the tagger having the best performance. We also design a new feature set to measure the performance of the statistical taggers. Our new taggers built from Stanford Tagger and ClearNLP with the new feature set can outperform all other current Vietnamese taggers in term of tagging accuracy. Moreover, we also analyze the affection of some features to the performance of statistical taggers. Lastly, the experimental results also reveal that the transformation-based tagger, RDRPOSTagger, can run significantly faster than any other statistical tagger.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes