CLJan 18, 2019

Chinese Word Segmentation: Another Decade Review (2007-2017)

Hai Zhao, Deng Cai, Changning Huang, Chunyu Kit

arXiv:1901.06079v11.026 citations

Originality Synthesis-oriented

AI Analysis

It provides a critical assessment for NLP researchers, highlighting incremental insights rather than breakthroughs.

This paper reviews Chinese word segmentation from 2007 to 2017, finding that neural network methods have not outperformed traditional supervised learning, with the key challenge being the balance between in-vocabulary and out-of-vocabulary word recognition.

This paper reviews the development of Chinese word segmentation (CWS) in the most recent decade, 2007-2017. Special attention was paid to the deep learning technologies that has already permeated into most areas of natural language processing (NLP). The basic view we have arrived at is that compared to traditional supervised learning methods, neural network based methods have not shown any superior performance. The most critical challenge still lies on balancing of recognition of in-vocabulary (IV) and out-of-vocabulary (OOV) words. However, as neural models have potentials to capture the essential linguistic structure of natural language, we are optimistic about significant progresses may arrive in the near future.

View on arXiv PDF

Similar