VCWE: Visual Character-Enhanced Word Embeddings
This addresses the challenge of capturing syntactic and semantic information in Chinese text for NLP applications, representing an incremental improvement over existing methods.
The paper tackled the problem of learning Chinese word embeddings by incorporating visual character shape information, achieving superior performance on word similarity, sentiment analysis, named entity recognition, and part-of-speech tagging tasks.
Chinese is a logographic writing system, and the shape of Chinese characters contain rich syntactic and semantic information. In this paper, we propose a model to learn Chinese word embeddings via three-level composition: (1) a convolutional neural network to extract the intra-character compositionality from the visual shape of a character; (2) a recurrent neural network with self-attention to compose character representation into word embeddings; (3) the Skip-Gram framework to capture non-compositionality directly from the contextual information. Evaluations demonstrate the superior performance of our model on four tasks: word similarity, sentiment analysis, named entity recognition and part-of-speech tagging.