CLLGMar 31, 2021

Joint Khmer Word Segmentation and Part-of-Speech Tagging Using Deep Learning

arXiv:2103.16801v18 citations
Originality Synthesis-oriented
AI Analysis

This work addresses a domain-specific problem for Khmer natural language processing, but it is incremental as it matches rather than surpasses existing methods.

The paper tackled the problem of Khmer part-of-speech tagging, which traditionally requires separate word segmentation, by proposing a joint deep learning model that performs both tasks simultaneously, achieving performance comparable to the conventional two-stage approach.

Khmer text is written from left to right with optional space. Space is not served as a word boundary but instead, it is used for readability or other functional purposes. Word segmentation is a prior step for downstream tasks such as part-of-speech (POS) tagging and thus, the robustness of POS tagging highly depends on word segmentation. The conventional Khmer POS tagging is a two-stage process that begins with word segmentation and then actual tagging of each word, afterward. In this work, a joint word segmentation and POS tagging approach using a single deep learning model is proposed so that word segmentation and POS tagging can be performed spontaneously. The proposed model was trained and tested using the publicly available Khmer POS dataset. The validation suggested that the performance of the joint model is on par with the conventional two-stage POS tagging.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes