CLFeb 23, 2019

VCWE: Visual Character-Enhanced Word Embeddings

arXiv:1902.08795v21094 citations
AI Analysis

This addresses the challenge of capturing syntactic and semantic information in Chinese text for NLP applications, representing an incremental improvement over existing methods.

The paper tackled the problem of learning Chinese word embeddings by incorporating visual character shape information, achieving superior performance on word similarity, sentiment analysis, named entity recognition, and part-of-speech tagging tasks.

Chinese is a logographic writing system, and the shape of Chinese characters contain rich syntactic and semantic information. In this paper, we propose a model to learn Chinese word embeddings via three-level composition: (1) a convolutional neural network to extract the intra-character compositionality from the visual shape of a character; (2) a recurrent neural network with self-attention to compose character representation into word embeddings; (3) the Skip-Gram framework to capture non-compositionality directly from the contextual information. Evaluations demonstrate the superior performance of our model on four tasks: word similarity, sentiment analysis, named entity recognition and part-of-speech tagging.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes