Learning Semantic Representations for the Phrase Translation Model
This work addresses the challenge of enhancing translation accuracy for machine translation systems, though it is incremental as it builds on existing phrase-based methods.
The paper tackled the problem of improving phrase-based statistical machine translation by introducing a semantic-based phrase translation model that projects phrases into a latent semantic space and computes translation scores based on distance, resulting in a gain of 0.7-1.0 BLEU points on Europarl tasks.
This paper presents a novel semantic-based phrase translation model. A pair of source and target phrases are projected into continuous-valued vector representations in a low-dimensional latent semantic space, where their translation score is computed by the distance between the pair in this new space. The projection is performed by a multi-layer neural network whose weights are learned on parallel training data. The learning is aimed to directly optimize the quality of end-to-end machine translation results. Experimental evaluation has been performed on two Europarl translation tasks, English-French and German-English. The results show that the new semantic-based phrase translation model significantly improves the performance of a state-of-the-art phrase-based statistical machine translation sys-tem, leading to a gain of 0.7-1.0 BLEU points.