CLSep 19, 2017

A Fast and Accurate Vietnamese Word Segmenter

arXiv:1709.06307v21108 citationsHas Code
AI Analysis

This work addresses word segmentation for Vietnamese language processing, offering incremental improvements over existing tools.

The authors tackled Vietnamese word segmentation by introducing a method based on Single Classification Ripple Down Rules, achieving state-of-the-art results with improved accuracy and speed on the benchmark Vietnamese treebank.

We propose a novel approach to Vietnamese word segmentation. Our approach is based on the Single Classification Ripple Down Rules methodology (Compton and Jansen, 1990), where rules are stored in an exception structure and new rules are only added to correct segmentation errors given by existing rules. Experimental results on the benchmark Vietnamese treebank show that our approach outperforms previous state-of-the-art approaches JVnSegmenter, vnTokenizer, DongDu and UETsegmenter in terms of both accuracy and performance speed. Our code is open-source and available at: https://github.com/datquocnguyen/RDRsegmenter.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes