Investigating Self-Attention Network for Chinese Word Segmentation
This work addresses Chinese word segmentation, a key task in natural language processing, but is incremental as it applies existing attention-based methods to this domain.
The authors tackled Chinese word segmentation by investigating self-attention networks (SAN) as an alternative to BiLSTM-CRF models, achieving highly competitive results with BERT and word information integration, and their final models gave the best results on 6 heterogeneous domain benchmarks.
Neural network has become the dominant method for Chinese word segmentation. Most existing models cast the task as sequence labeling, using BiLSTM-CRF for representing the input and making output predictions. Recently, attention-based sequence models have emerged as a highly competitive alternative to LSTMs, which allow better running speed by parallelization of computation. We investigate self attention network for Chinese word segmentation, making comparisons between BiLSTM-CRF models. In addition, the influence of contextualized character embeddings is investigated using BERT, and a method is proposed for integrating word information into SAN segmentation. Results show that SAN gives highly competitive results compared with BiLSTMs, with BERT and word information further improving segmentation for in-domain and cross-domain segmentation. Our final models give the best results for 6 heterogenous domain benchmarks.