CLLGMLJul 11, 2018

Neural Chinese Word Segmentation with Dictionary Knowledge

arXiv:1807.05849v155 citations
Originality Incremental advance
AI Analysis

This work addresses the need for more efficient Chinese word segmentation in NLP by reducing reliance on large labeled datasets, though it is incremental in nature.

The paper tackled the problem of neural Chinese word segmentation by proposing methods to incorporate dictionary knowledge, which improved performance, especially with limited training data, as validated on two benchmark datasets.

Chinese word segmentation (CWS) is an important task for Chinese NLP. Recently, many neural network based methods have been proposed for CWS. However, these methods require a large number of labeled sentences for model training, and usually cannot utilize the useful information in Chinese dictionary. In this paper, we propose two methods to exploit the dictionary information for CWS. The first one is based on pseudo labeled data generation, and the second one is based on multi-task learning. The experimental results on two benchmark datasets validate that our approach can effectively improve the performance of Chinese word segmentation, especially when training data is insufficient.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes