CLFeb 28, 2024

NewsQs: Multi-Source Question Generation for the Inquiring Mind

Amazon
arXiv:2402.18479v2h-index: 20
Originality Synthesis-oriented
AI Analysis

This provides a resource for query-based multi-document summarization, but it is incremental as it builds on existing datasets and methods.

The authors tackled the problem of generating question-answer pairs for multiple news documents by creating the NewsQs dataset, which uses a fine-tuned T5-Large model with control codes to produce questions judged more acceptable in human evaluation.

We present NewsQs (news-cues), a dataset that provides question-answer pairs for multiple news documents. To create NewsQs, we augment a traditional multi-document summarization dataset with questions automatically generated by a T5-Large model fine-tuned on FAQ-style news articles from the News On the Web corpus. We show that fine-tuning a model with control codes produces questions that are judged acceptable more often than the same model without them as measured through human evaluation. We use a QNLI model with high correlation with human annotations to filter our data. We release our final dataset of high-quality questions, answers, and document clusters as a resource for future work in query-based multi-document summarization.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes