As Easy as 1, 2, 3: Behavioural Testing of NMT Systems for Numerical Translation
This work addresses the robustness issue in NMT systems for numerical translation, which is critical for applications requiring high accuracy, but it is incremental as it focuses on testing and exposing errors rather than proposing a new solution.
The paper tackled the problem of numerical mistranslation in neural machine translation systems, which can cause serious effects like financial loss or medical misinformation, and found that major commercial and state-of-the-art research models fail on many test examples for both high- and low-resource languages.
Mistranslated numbers have the potential to cause serious effects, such as financial loss or medical misinformation. In this work we develop comprehensive assessments of the robustness of neural machine translation systems to numerical text via behavioural testing. We explore a variety of numerical translation capabilities a system is expected to exhibit and design effective test examples to expose system underperformance. We find that numerical mistranslation is a general issue: major commercial systems and state-of-the-art research models fail on many of our test examples, for high- and low-resource languages. Our tests reveal novel errors that have not previously been reported in NMT systems, to the best of our knowledge. Lastly, we discuss strategies to mitigate numerical mistranslation.