CLJan 5, 2021

PhoNLP: A joint multi-task learning model for Vietnamese part-of-speech tagging, named entity recognition and dependency parsing

arXiv:2101.01476v231.9734 citationsh-index: 25Has Code

Originality Incremental advance

AI Analysis

This work provides a strong baseline and toolkit for NLP research and applications in Vietnamese, and potentially other languages, by improving performance on fundamental tasks.

This paper introduces PhoNLP, the first multi-task learning model for joint Vietnamese part-of-speech tagging, named entity recognition, and dependency parsing. It achieves state-of-the-art results on Vietnamese benchmark datasets, outperforming single-task fine-tuning of the PhoBERT model.

We present the first multi-task learning model -- named PhoNLP -- for joint Vietnamese part-of-speech (POS) tagging, named entity recognition (NER) and dependency parsing. Experiments on Vietnamese benchmark datasets show that PhoNLP produces state-of-the-art results, outperforming a single-task learning approach that fine-tunes the pre-trained Vietnamese language model PhoBERT (Nguyen and Nguyen, 2020) for each task independently. We publicly release PhoNLP as an open-source toolkit under the Apache License 2.0. Although we specify PhoNLP for Vietnamese, our PhoNLP training and evaluation command scripts in fact can directly work for other languages that have a pre-trained BERT-based language model and gold annotated corpora available for the three tasks of POS tagging, NER and dependency parsing. We hope that PhoNLP can serve as a strong baseline and useful toolkit for future NLP research and applications to not only Vietnamese but also the other languages. Our PhoNLP is available at: https://github.com/VinAIResearch/PhoNLP

View on arXiv PDF Code

Similar