CLAug 28, 2025

KCS: Diversify Multi-hop Question Generation with Knowledge Composition Sampling

arXiv:2508.20567v12 citationsh-index: 5Has CodeEMNLP
Originality Incremental advance
AI Analysis

This addresses data sparsity issues in multi-hop QA systems, though it appears incremental as it builds on prior diversification methods.

The paper tackles the problem of data sparsity in multi-hop question answering by introducing Knowledge Composition Sampling (KCS), a framework that diversifies question generation through varied knowledge compositions, improving knowledge composition selection accuracy by 3.9% and enhancing performance on HotpotQA and 2WikiMultihopQA datasets.

Multi-hop question answering faces substantial challenges due to data sparsity, which increases the likelihood of language models learning spurious patterns. To address this issue, prior research has focused on diversifying question generation through content planning and varied expression. However, these approaches often emphasize generating simple questions and neglect the integration of essential knowledge, such as relevant sentences within documents. This paper introduces the Knowledge Composition Sampling (KCS), an innovative framework designed to expand the diversity of generated multi-hop questions by sampling varied knowledge compositions within a given context. KCS models the knowledge composition selection as a sentence-level conditional prediction task and utilizes a probabilistic contrastive loss to predict the next most relevant piece of knowledge. During inference, we employ a stochastic decoding strategy to effectively balance accuracy and diversity. Compared to competitive baselines, our KCS improves the overall accuracy of knowledge composition selection by 3.9%, and its application for data augmentation yields improvements on HotpotQA and 2WikiMultihopQA datasets. Our code is available at: https://github.com/yangfanww/kcs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes