CLAIMay 23, 2023

LLM-Eval: Unified Multi-Dimensional Automatic Evaluation for Open-Domain Conversations with Large Language Models

arXiv:2305.13711v1280 citations
Originality Incremental advance
AI Analysis

This provides a more efficient evaluation solution for researchers and developers working on open-domain conversation systems, though it is incremental as it builds on existing LLM-based evaluation approaches.

The paper tackled the problem of expensive and time-consuming evaluation for open-domain conversations by proposing LLM-Eval, a single prompt-based method that covers multiple quality dimensions in one model call, demonstrating effectiveness and efficiency compared to state-of-the-art methods.

We propose LLM-Eval, a unified multi-dimensional automatic evaluation method for open-domain conversations with large language models (LLMs). Existing evaluation methods often rely on human annotations, ground-truth responses, or multiple LLM prompts, which can be expensive and time-consuming. To address these issues, we design a single prompt-based evaluation method that leverages a unified evaluation schema to cover multiple dimensions of conversation quality in a single model call. We extensively evaluate the performance of LLM-Eval on various benchmark datasets, demonstrating its effectiveness, efficiency, and adaptability compared to state-of-the-art evaluation methods. Our analysis also highlights the importance of choosing suitable LLMs and decoding strategies for accurate evaluation results. LLM-Eval offers a versatile and robust solution for evaluating open-domain conversation systems, streamlining the evaluation process and providing consistent performance across diverse scenarios.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes