CLApr 28, 2017

Neural Word Segmentation with Rich Pretraining

arXiv:1704.08960v1116 citations
Originality Incremental advance
AI Analysis

This work addresses word segmentation for natural language processing, but it is incremental as it builds on existing neural and statistical approaches.

The paper tackled the problem of neural word segmentation by investigating the effectiveness of using rich external training sources, such as punctuation and POS, to pretrain a modular model, resulting in accuracies competitive with the best methods on six benchmarks.

Neural word segmentation research has benefited from large-scale raw texts by leveraging them for pretraining character and word embeddings. On the other hand, statistical segmentation research has exploited richer sources of external information, such as punctuation, automatic segmentation and POS. We investigate the effectiveness of a range of external training sources for neural word segmentation by building a modular segmentation model, pretraining the most important submodule using rich external sources. Results show that such pretraining significantly improves the model, leading to accuracies competitive to the best methods on six benchmarks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes