Jiaqian Hu

9.6CLMay 20, 2025Code

MAATS: A Multi-Agent Automated Translation System Based on MQM Evaluation

George Wang, Jiaqian Hu, Safinah Ali

We present MAATS, a Multi Agent Automated Translation System that leverages the Multidimensional Quality Metrics (MQM) framework as a fine-grained signal for error detection and refinement. MAATS employs multiple specialized AI agents, each focused on a distinct MQM category (e.g., Accuracy, Fluency, Style, Terminology), followed by a synthesis agent that integrates the annotations to iteratively refine translations. This design contrasts with conventional single-agent methods that rely on self-correction. Evaluated across diverse language pairs and Large Language Models (LLMs), MAATS outperforms zero-shot and single-agent baselines with statistically significant gains in both automatic metrics and human assessments. It excels particularly in semantic accuracy, locale adaptation, and linguistically distant language pairs. Qualitative analysis highlights its strengths in multi-layered error diagnosis, omission detection across perspectives, and context-aware refinement. By aligning modular agent roles with interpretable MQM dimensions, MAATS narrows the gap between black-box LLMs and human translation workflows, shifting focus from surface fluency to deeper semantic and contextual fidelity.

8.3HCApr 28

CHORUS: Effort-Aware Multi-Agent Human-AI Collaboration for Professional Translation

George X. Wang, Jiaqian Hu, Guande Wu Jing Qian

Despite the widespread use of automatic AI translation systems in daily language tasks, professional translation remains crucial in domain-specific and high-stakes scenarios. Yet professional translators rarely rely on these systems in their everyday practice due to a lack of detailed support for the translation process, matching professional styles, and accountability for the final outcome. To bridge the gap, we present CHORUS, a mixed-initiative translation system that supports the translation process and personal style as translators work. A formative study found that incorporating MQM theory may be beneficial for achieving professional translation, and that the system should adapt to each individual translator's idiosyncratic traits. The final within-subject study with 30 licensed English--Chinese translators found that our system reduced completion time by 33.8\%, lowered translators' cognitive effort, and improved final translation quality using the BLEU and COMET as automatic evaluation metrics. Participants' qualitative analysis also revealed that the system made translation issues easier to inspect, reduced repeated prompting compared to single-agent AI systems, and offered reflections on their habits and traits. Our findings illustrate how multi-agent AI systems can be designed to support expert workflows and their potential for professional use.

Jiaqian Hu

2 Papers