CLApr 5, 2024

Verifiable by Design: Aligning Language Models to Quote from Pre-Training Data

arXiv:2404.03862v417 citationsh-index: 47NAACL
Originality Incremental advance
AI Analysis

This addresses the trustworthiness issue for users relying on LLM outputs by providing a more verifiable approach, though it is incremental as it builds on existing alignment methods.

The paper tackled the problem of verifying the correctness of large language model (LLM) generations by aligning models to quote verbatim from trusted pre-training data, resulting in up to a 130% relative increase in verbatim quotes while maintaining response quality.

To trust the fluent generations of large language models (LLMs), humans must be able to verify their correctness against trusted, external sources. Recent efforts, such as providing citations via retrieved documents or post-hoc provenance, enhance verifiability but provide no guarantees on their correctness. To address these limitations, we tackle the verifiability goal with a different philosophy: trivializing the verification process by developing models that quote verbatim statements from trusted sources in their pre-training data. We propose Quote-Tuning, which demonstrates the feasibility of aligning models to quote. The core of Quote-Tuning is a fast membership inference function that efficiently verifies text against trusted corpora. We leverage this tool to design a reward function to quantify quotes in model responses, and curate datasets for preference learning. Experiments show that Quote-Tuning significantly increases verbatim quotes from high-quality documents by up to 130% relative to base models while maintaining response quality. Quote-Tuning is applicable in different tasks, generalizes to out-of-domain data and diverse model families, and provides additional benefits to truthfulness. Our method not only serves as a hassle-free method to increase quoting but also opens up avenues for improving LLM trustworthiness through better verifiability.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes