CLAILGNov 20, 2023

Exploring Prompting Large Language Models as Explainable Metrics

arXiv:2311.11552v1106 citationsh-index: 3
Originality Synthesis-oriented
AI Analysis

This work addresses the need for explainable evaluation metrics in NLP, specifically for summarization tasks, though it appears incremental as it builds on existing prompting methods.

The authors tackled the problem of evaluating text summarization quality by proposing a zero-shot prompt-based strategy using Large Language Models (LLMs), achieving a Kendall correlation of 0.477 with human evaluations on test data.

This paper describes the IUST NLP Lab submission to the Prompting Large Language Models as Explainable Metrics Shared Task at the Eval4NLP 2023 Workshop on Evaluation & Comparison of NLP Systems. We have proposed a zero-shot prompt-based strategy for explainable evaluation of the summarization task using Large Language Models (LLMs). The conducted experiments demonstrate the promising potential of LLMs as evaluation metrics in Natural Language Processing (NLP), particularly in the field of summarization. Both few-shot and zero-shot approaches are employed in these experiments. The performance of our best provided prompts achieved a Kendall correlation of 0.477 with human evaluations in the text summarization task on the test data. Code and results are publicly available on GitHub.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes