CLOct 14, 2024

Beyond Human-Only: Evaluating Human-Machine Collaboration for Collecting High-Quality Translation Data

arXiv:2410.11056v125 citationsh-index: 12WMT
Originality Incremental advance
AI Analysis

This addresses the cost and speed issues in data collection for machine translation systems, offering an incremental improvement over traditional methods.

The study tackled the problem of collecting high-quality translation data by evaluating 11 approaches, including human-only, machine-only, and hybrid methods, and found that human-machine collaboration can match or exceed human-only quality while being more cost-efficient, with some methods achieving top-tier quality at around 60% of the cost.

Collecting high-quality translations is crucial for the development and evaluation of machine translation systems. However, traditional human-only approaches are costly and slow. This study presents a comprehensive investigation of 11 approaches for acquiring translation data, including human-only, machineonly, and hybrid approaches. Our findings demonstrate that human-machine collaboration can match or even exceed the quality of human-only translations, while being more cost-efficient. Error analysis reveals the complementary strengths between human and machine contributions, highlighting the effectiveness of collaborative methods. Cost analysis further demonstrates the economic benefits of human-machine collaboration methods, with some approaches achieving top-tier quality at around 60% of the cost of traditional methods. We release a publicly available dataset containing nearly 18,000 segments of varying translation quality with corresponding human ratings to facilitate future research.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes