CLApr 3, 2024

Towards Large Language Model driven Reference-less Translation Evaluation for English and Indian Languages

Vandan Mujadia, Pruthwik Mishra, Arafat Ahsan, Dipti Misra Sharma

arXiv:2404.02512v13.46 citationsh-index: 24ICON

Originality Synthesis-oriented

AI Analysis

This work addresses the challenge of evaluating translation quality without reference texts, specifically for Indian languages, though it is incremental as it applies existing LLM techniques to a new domain.

This paper tackles the problem of reference-less translation evaluation for English and Indian languages by using large language models (LLMs) to mimic human assessments, finding that an LLM-based evaluator (LLaMA-2-13B) achieves comparable or higher correlation with human judgments compared to existing methods.

With the primary focus on evaluating the effectiveness of large language models for automatic reference-less translation assessment, this work presents our experiments on mimicking human direct assessment to evaluate the quality of translations in English and Indian languages. We constructed a translation evaluation task where we performed zero-shot learning, in-context example-driven learning, and fine-tuning of large language models to provide a score out of 100, where 100 represents a perfect translation and 1 represents a poor translation. We compared the performance of our trained systems with existing methods such as COMET, BERT-Scorer, and LABSE, and found that the LLM-based evaluator (LLaMA-2-13B) achieves a comparable or higher overall correlation with human judgments for the considered Indian language pairs.

View on arXiv PDF

Similar