CLFeb 18, 2020

A New Clustering neural network for Chinese word segmentation

arXiv:2002.07458v1
AI Analysis

This addresses Chinese word segmentation for NLP applications, but it appears incremental as it builds on existing neural components like LSTM and self-attention.

The authors tackled Chinese word segmentation by reframing it as a clustering problem rather than a labeling problem, achieving an F-score of 98% without OOV words and 85-95% with OOV words in training datasets.

In this article I proposed a new model to achieve Chinese word segmentation(CWS),which may have the potentiality to apply in other domains in the future.It is a new thinking in CWS compared to previous works,to consider it as a clustering problem instead of a labeling problem.In this model,LSTM and self attention structures are used to collect context also sentence level features in every layer,and after several layers,a clustering model is applied to split characters into groups,which are the final segmentation results.I call this model CLNN.This algorithm can reach 98 percent of F score (without OOV words) and 85 percent to 95 percent F score (with OOV words) in training data sets.Error analyses shows that OOV words will greatly reduce performances,which needs a deeper research in the future.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes