Enhanced Bilingual Evaluation Understudy
This work addresses the need for more adjustable and robust evaluation metrics in statistical machine translation, though it appears incremental as it builds directly on the BLEU technique.
The research tackled the problem of BLEU's limited ability to capture human-like variations in machine translation by proposing an enhanced version that considers synonyms, word order, and style, resulting in improved performance and correlation with existing methods.
Our research extends the Bilingual Evaluation Understudy (BLEU) evaluation technique for statistical machine translation to make it more adjustable and robust. We intend to adapt it to resemble human evaluation more. We perform experiments to evaluate the performance of our technique against the primary existing evaluation methods. We describe and show the improvements it makes over existing methods as well as correlation to them. When human translators translate a text, they often use synonyms, different word orders or style, and other similar variations. We propose an SMT evaluation technique that enhances the BLEU metric to consider variations such as those.