CLJan 13, 2024

Leveraging Large Language Models for NLG Evaluation: Advances and Challenges

Zhen Li, Xiaohan Xu, Tao Shen, Can Xu, Jia-Chen Gu, Yuxuan Lai, Chongyang Tao, Shuai Ma

arXiv:2401.07103v222.155 citationsh-index: 7Has CodeEMNLP

Originality Synthesis-oriented

AI Analysis

It addresses the problem of evaluating NLG outputs for researchers, but it is incremental as it focuses on organizing existing methods rather than introducing new ones.

This paper tackles the lack of systematic analysis in NLG evaluation by providing a thorough overview and proposing a taxonomy for organizing LLM-based metrics, aiming to offer insights for researchers.

In the rapidly evolving domain of Natural Language Generation (NLG) evaluation, introducing Large Language Models (LLMs) has opened new avenues for assessing generated content quality, e.g., coherence, creativity, and context relevance. This paper aims to provide a thorough overview of leveraging LLMs for NLG evaluation, a burgeoning area that lacks a systematic analysis. We propose a coherent taxonomy for organizing existing LLM-based evaluation metrics, offering a structured framework to understand and compare these methods. Our detailed exploration includes critically assessing various LLM-based methodologies, as well as comparing their strengths and limitations in evaluating NLG outputs. By discussing unresolved challenges, including bias, robustness, domain-specificity, and unified evaluation, this paper seeks to offer insights to researchers and advocate for fairer and more advanced NLG evaluation techniques.

View on arXiv PDF Code

Similar