MetaMetrics-MT: Tuning Meta-Metrics for Machine Translation via Human Preference Calibration
This addresses the challenge of accurate machine translation evaluation for researchers and practitioners, though it appears incremental as it enhances existing metrics rather than introducing a fundamentally new approach.
The paper tackles the problem of evaluating machine translation by developing MetaMetrics-MT, a metric that aligns with human preferences using Bayesian optimization with Gaussian Processes, and it outperforms all existing baselines on the WMT24 dataset, setting a new state-of-the-art benchmark in reference-based settings.
We present MetaMetrics-MT, an innovative metric designed to evaluate machine translation (MT) tasks by aligning closely with human preferences through Bayesian optimization with Gaussian Processes. MetaMetrics-MT enhances existing MT metrics by optimizing their correlation with human judgments. Our experiments on the WMT24 metric shared task dataset demonstrate that MetaMetrics-MT outperforms all existing baselines, setting a new benchmark for state-of-the-art performance in the reference-based setting. Furthermore, it achieves comparable results to leading metrics in the reference-free setting, offering greater efficiency.