Identifying Semantic Divergences in Parallel Text without Annotations
This addresses the challenge of identifying meaning differences in translations for machine translation systems, but it is incremental as it builds on existing methods for semantic similarity.
The paper tackles the problem of detecting semantic divergences in parallel text without annotations by using a deep neural model of bilingual semantic similarity, showing it detects divergences more accurately than surface-feature models and that these divergences impact neural machine translation.
Recognizing that even correct translations are not always semantically equivalent, we automatically detect meaning divergences in parallel sentence pairs with a deep neural model of bilingual semantic similarity which can be trained for any parallel corpus without any manual annotation. We show that our semantic model detects divergences more accurately than models based on surface features derived from word alignments, and that these divergences matter for neural machine translation.