CLAug 16, 2017

Learning Chinese Word Representations From Glyphs Of Characters

arXiv:1708.04755v11101 citationsHas Code
Originality Incremental advance
AI Analysis

This work addresses the challenge of enhancing Chinese word representations for natural language processing tasks, though it appears incremental as it builds on existing character embedding methods.

The paper tackles the problem of learning Chinese word representations by incorporating character glyph features derived from bitmaps using a convolutional auto-encoder, resulting in improved word representations enhanced by character embeddings. It also contributes by creating and releasing several evaluation datasets in traditional Chinese.

In this paper, we propose new methods to learn Chinese word representations. Chinese characters are composed of graphical components, which carry rich semantics. It is common for a Chinese learner to comprehend the meaning of a word from these graphical components. As a result, we propose models that enhance word representations by character glyphs. The character glyph features are directly learned from the bitmaps of characters by convolutional auto-encoder(convAE), and the glyph features improve Chinese word representations which are already enhanced by character embeddings. Another contribution in this paper is that we created several evaluation datasets in traditional Chinese and made them public.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes