CLAIMay 23, 2023

CTQScorer: Combining Multiple Features for In-context Example Selection for Machine Translation

arXiv:2305.14105v2139 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of optimizing example selection for machine translation in large language models, representing an incremental advancement over previous single-feature methods.

The paper tackles the problem of selecting in-context examples for machine translation by proposing CTQScorer, a framework that combines multiple features to improve translation quality, resulting in significant outperformance over baselines and an average improvement of over 2.5 COMET points.

Large language models have demonstrated the capability to perform on machine translation when the input is prompted with a few examples (in-context learning). Translation quality depends on various features of the selected examples, such as their quality and relevance, but previous work has predominantly focused on individual features in isolation. In this paper, we propose a general framework for combining different features influencing example selection. We learn a regression model, CTQ Scorer (Contextual Translation Quality), that selects examples based on multiple features in order to maximize the translation quality. On multiple language pairs and language models, we show that CTQ Scorer helps significantly outperform random selection as well as strong single-factor baselines reported in the literature. We also see an improvement of over 2.5 COMET points on average with respect to a strong BM25 retrieval-based baseline.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes