CLOct 21, 2023

GEMBA-MQM: Detecting Translation Quality Error Spans with GPT-4

Microsoft
arXiv:2310.13988v1179 citationsh-index: 31
Originality Incremental advance
AI Analysis

This addresses translation quality estimation for researchers and practitioners, but it is incremental as it builds on existing LLM methods with a language-agnostic prompt approach.

The paper tackles translation quality error detection by introducing GEMBA-MQM, a GPT-4-based metric that uses three-shot prompting to mark error spans without human references, achieving state-of-the-art accuracy for system ranking.

This paper introduces GEMBA-MQM, a GPT-based evaluation metric designed to detect translation quality errors, specifically for the quality estimation setting without the need for human reference translations. Based on the power of large language models (LLM), GEMBA-MQM employs a fixed three-shot prompting technique, querying the GPT-4 model to mark error quality spans. Compared to previous works, our method has language-agnostic prompts, thus avoiding the need for manual prompt preparation for new languages. While preliminary results indicate that GEMBA-MQM achieves state-of-the-art accuracy for system ranking, we advise caution when using it in academic works to demonstrate improvements over other methods due to its dependence on the proprietary, black-box GPT model.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes