$\textit{Dial BeInfo for Faithfulness}$: Improving Factuality of Information-Seeking Dialogue via Behavioural Fine-Tuning
This addresses factuality issues in information-seeking dialogue systems, offering a practical solution for improving reliability in applications like customer support or search assistants, though it is incremental as it builds on existing fine-tuning techniques.
The paper tackled the problem of hallucinations in large language models for information-seeking dialogue by introducing BeInfo, a behavioral fine-tuning method that significantly improved faithfulness to knowledge sources across seen and unseen domains, with a 3B-parameter model outperforming GPT-4 on real production data.
Factuality is a crucial requirement in information seeking dialogue: the system should respond to the user's queries so that the responses are meaningful and aligned with the knowledge provided to the system. However, most modern large language models suffer from hallucinations, that is, they generate responses not supported by or contradicting the knowledge source. To mitigate the issue and increase faithfulness of information-seeking dialogue systems, we introduce BeInfo, a simple yet effective method that applies behavioural tuning to aid information-seeking dialogue. Relying on three standard datasets, we show that models tuned with BeInfo} become considerably more faithful to the knowledge source both for datasets and domains seen during BeInfo-tuning, as well as on unseen domains, when applied in a zero-shot manner. In addition, we show that the models with 3B parameters (e.g., Flan-T5) tuned with BeInfo demonstrate strong performance on data from real `production' conversations and outperform GPT4 when tuned on a limited amount of such realistic in-domain dialogues.