CLJun 1, 2023

Multi-Dimensional Evaluation of Text Summarization with In-Context Learning

Sameer Jain, Vaishakh Keshava, Swarnashree Mysore Sathyendra, Patrick Fernandes, Pengfei Liu, Graham Neubig, Chunting Zhou

CMU

arXiv:2306.01200v128.0241 citationsh-index: 91Has Code

Originality Incremental advance

AI Analysis

This addresses the challenge of efficient and accurate evaluation for natural language generation tasks, particularly for researchers and practitioners in NLP, though it is incremental as it builds on existing in-context learning methods.

The paper tackles the problem of multi-dimensional evaluation of text summarization by using large language models with in-context learning, eliminating the need for large training datasets, and shows that this approach is competitive with learned frameworks, achieving state-of-the-art results on dimensions like relevance and factual consistency.

Evaluation of natural language generation (NLG) is complex and multi-dimensional. Generated text can be evaluated for fluency, coherence, factuality, or any other dimensions of interest. Most frameworks that perform such multi-dimensional evaluation require training on large manually or synthetically generated datasets. In this paper, we study the efficacy of large language models as multi-dimensional evaluators using in-context learning, obviating the need for large training datasets. Our experiments show that in-context learning-based evaluators are competitive with learned evaluation frameworks for the task of text summarization, establishing state-of-the-art on dimensions such as relevance and factual consistency. We then analyze the effects of factors such as the selection and number of in-context examples on performance. Finally, we study the efficacy of in-context learning based evaluators in evaluating zero-shot summaries written by large language models such as GPT-3.

View on arXiv PDF Code

Similar