CLLGApr 9, 2021

Chinese Character Decomposition for Neural MT with Multi-Word Expressions

arXiv:2104.04497v1730 citations
AI Analysis

This work addresses translation quality for Chinese language processing, though it is incremental as it builds on existing decomposition and MWE integration methods.

This paper investigates how different levels of Chinese character decomposition (radical, stroke, and intermediate) impact neural machine translation, finding that radical-level decomposition yields the best BLEU scores (e.g., 24.5 vs. 23.8 baseline) and that decomposed multi-word expressions further improve performance.

Chinese character decomposition has been used as a feature to enhance Machine Translation (MT) models, combining radicals into character and word level models. Recent work has investigated ideograph or stroke level embedding. However, questions remain about different decomposition levels of Chinese character representations, radical and strokes, best suited for MT. To investigate the impact of Chinese decomposition embedding in detail, i.e., radical, stroke, and intermediate levels, and how well these decompositions represent the meaning of the original character sequences, we carry out analysis with both automated and human evaluation of MT. Furthermore, we investigate if the combination of decomposed Multiword Expressions (MWEs) can enhance the model learning. MWE integration into MT has seen more than a decade of exploration. However, decomposed MWEs has not previously been explored.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes