CLNov 2, 2023

The Effect of Scaling, Retrieval Augmentation and Form on the Factual Consistency of Language Models

Lovisa Hagström, Denitsa Saynova, Tobias Norlund, Moa Johansson, Richard Johansson

arXiv:2311.01307v121.6135 citationsh-index: 4Has Code

Originality Incremental advance

AI Analysis

This work addresses the issue of unreliable factual outputs in LLMs for users relying on them for knowledge, but it is incremental as it builds on existing methods to analyze and mitigate inconsistency.

The study tackled the problem of factual inconsistency in large language models (LLMs) by evaluating scaling and retrieval augmentation as mitigation strategies, finding that both reduce inconsistency with retrieval augmentation being more efficient, and that syntactical form affects consistency across models.

Large Language Models (LLMs) make natural interfaces to factual knowledge, but their usefulness is limited by their tendency to deliver inconsistent answers to semantically equivalent questions. For example, a model might predict both "Anne Redpath passed away in Edinburgh." and "Anne Redpath's life ended in London." In this work, we identify potential causes of inconsistency and evaluate the effectiveness of two mitigation strategies: up-scaling and augmenting the LM with a retrieval corpus. Our results on the LLaMA and Atlas models show that both strategies reduce inconsistency while retrieval augmentation is considerably more efficient. We further consider and disentangle the consistency contributions of different components of Atlas. For all LMs evaluated we find that syntactical form and other evaluation task artifacts impact consistency. Taken together, our results provide a better understanding of the factors affecting the factual consistency of language models.

View on arXiv PDF Code

Similar