CLAILGAug 22, 2023

Diversity Measures: Domain-Independent Proxies for Failure in Language Model Queries

arXiv:2308.11189v18 citationsh-index: 26Has Code
Originality Incremental advance
AI Analysis

This work addresses error prediction for users of large language models by providing incremental improvements through novel proxies that avoid reliance on domain-specific information.

The paper tackled the problem of predicting errors in large language models by introducing domain-independent diversity measures based on entropy, Gini impurity, and centroid distance, and demonstrated that these measures strongly correlate with failure probability across multiple datasets and temperature settings.

Error prediction in large language models often relies on domain-specific information. In this paper, we present measures for quantification of error in the response of a large language model based on the diversity of responses to a given prompt - hence independent of the underlying application. We describe how three such measures - based on entropy, Gini impurity, and centroid distance - can be employed. We perform a suite of experiments on multiple datasets and temperature settings to demonstrate that these measures strongly correlate with the probability of failure. Additionally, we present empirical results demonstrating how these measures can be applied to few-shot prompting, chain-of-thought reasoning, and error detection.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes