Neural Chinese Word Segmentation with Dictionary Knowledge
This work addresses the need for more efficient Chinese word segmentation in NLP by reducing reliance on large labeled datasets, though it is incremental in nature.
The paper tackled the problem of neural Chinese word segmentation by proposing methods to incorporate dictionary knowledge, which improved performance, especially with limited training data, as validated on two benchmark datasets.
Chinese word segmentation (CWS) is an important task for Chinese NLP. Recently, many neural network based methods have been proposed for CWS. However, these methods require a large number of labeled sentences for model training, and usually cannot utilize the useful information in Chinese dictionary. In this paper, we propose two methods to exploit the dictionary information for CWS. The first one is based on pseudo labeled data generation, and the second one is based on multi-task learning. The experimental results on two benchmark datasets validate that our approach can effectively improve the performance of Chinese word segmentation, especially when training data is insufficient.