CLAINov 15, 2023

How Well Do Large Language Models Truly Ground?

UW
arXiv:2311.09069v234 citationsh-index: 16
Originality Incremental advance
AI Analysis

This work addresses the issue of hallucinations and lack of control in LLMs for developers and researchers, though it is incremental as it refines existing grounding concepts.

The paper tackles the problem of unreliable grounding in Large Language Models (LLMs) by proposing a stricter definition of grounding that requires full utilization of provided context and staying within its limits, and introduces a new dataset and metric to evaluate 25 LLMs, revealing insights into factors affecting performance.

To reduce issues like hallucinations and lack of control in Large Language Models (LLMs), a common method is to generate responses by grounding on external contexts given as input, known as knowledge-augmented models. However, previous research often narrowly defines "grounding" as just having the correct answer, which does not ensure the reliability of the entire response. To overcome this, we propose a stricter definition of grounding: a model is truly grounded if it (1) fully utilizes the necessary knowledge from the provided context, and (2) stays within the limits of that knowledge. We introduce a new dataset and a grounding metric to evaluate model capability under the definition. We perform experiments across 25 LLMs of different sizes and training methods and provide insights into factors that influence grounding performance. Our findings contribute to a better understanding of how to improve grounding capabilities and suggest an area of improvement toward more reliable and controllable LLM applications.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes