CLJan 22, 2022

Chinese Word Segmentation with Heterogeneous Graph Neural Network

arXiv:2201.08975v15 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of effectively integrating multi-level linguistic information for Chinese word segmentation, which is incremental as it builds on existing deep learning methods by incorporating structural features.

The paper tackles Chinese word segmentation by proposing HGNSeg, a framework that integrates multi-level external information using pre-trained language models and heterogeneous graph neural networks, achieving improved performance on six benchmark datasets and showing strong ability to alleviate the out-of-vocabulary problem in cross-domain scenarios.

In recent years, deep learning has achieved significant success in the Chinese word segmentation (CWS) task. Most of these methods improve the performance of CWS by leveraging external information, e.g., words, sub-words, syntax. However, existing approaches fail to effectively integrate the multi-level linguistic information and also ignore the structural feature of the external information. Therefore, in this paper, we proposed a framework to improve CWS, named HGNSeg. It exploits multi-level external information sufficiently with the pre-trained language model and heterogeneous graph neural network. The experimental results on six benchmark datasets (e.g., Bakeoff 2005, Bakeoff 2008) validate that our approach can effectively improve the performance of Chinese word segmentation. Importantly, in cross-domain scenarios, our method also shows a strong ability to alleviate the out-of-vocabulary (OOV) problem.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes