CLAILGFeb 26, 2025

Winning Big with Small Models: Knowledge Distillation vs. Self-Training for Reducing Hallucination in Product QA Agents

arXiv:2502.19545v21 citationsh-index: 23Has Code
AI Analysis

This addresses cost and reliability issues in customer support AI for businesses, offering a scalable solution with open-source models, though it is incremental in optimizing existing methods.

The paper tackled hallucination in product QA agents by comparing knowledge distillation and self-training, finding that self-training with synthetic data reduces hallucination comparably to using stronger models like GPT-4o, with improvements in handling unanswerable questions.

The deployment of Large Language Models (LLMs) in customer support is constrained by hallucination (generating false information) and the high cost of proprietary models. To address these challenges, we propose a retrieval-augmented question-answering (QA) pipeline and explore how to balance human input and automation. Using a dataset of questions about a Samsung Smart TV user manual, we demonstrate that synthetic data generated by LLMs outperforms crowdsourced data in reducing hallucination in finetuned models. We also compare self-training (fine-tuning models on their own outputs) and knowledge distillation (fine-tuning on stronger models' outputs, e.g., GPT-4o), and find that self-training achieves comparable hallucination reduction. We conjecture that this surprising finding can be attributed to increased exposure bias issues in the knowledge distillation case and support this conjecture with post hoc analysis. We also improve robustness to unanswerable questions and retrieval failures with contextualized "I don't know" responses. These findings show that scalable, cost-efficient QA systems can be built using synthetic data and self-training with open-source models, reducing reliance on proprietary tools or costly human annotations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes