CLAILGMar 26, 2024

SciNews: From Scholarly Complexities to Public Narratives -- A Dataset for Scientific News Report Generation

arXiv:2403.17768v283 citationsh-index: 6LREC
Originality Synthesis-oriented
AI Analysis

This dataset addresses the need for accessible scientific communication for the public, though it is incremental as it builds on existing text generation methods.

The authors introduced SciNews, a dataset pairing academic papers with corresponding news reports across nine disciplines to support automated generation of scientific news, and benchmarked it with state-of-the-art models using automatic and human evaluations.

Scientific news reports serve as a bridge, adeptly translating complex research articles into reports that resonate with the broader public. The automated generation of such narratives enhances the accessibility of scholarly insights. In this paper, we present a new corpus to facilitate this paradigm development. Our corpus comprises a parallel compilation of academic publications and their corresponding scientific news reports across nine disciplines. To demonstrate the utility and reliability of our dataset, we conduct an extensive analysis, highlighting the divergences in readability and brevity between scientific news narratives and academic manuscripts. We benchmark our dataset employing state-of-the-art text generation models. The evaluation process involves both automatic and human evaluation, which lays the groundwork for future explorations into the automated generation of scientific news reports. The dataset and code related to this work are available at https://dongqi.me/projects/SciNews.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes