CLOct 20, 2021

SciXGen: A Scientific Paper Dataset for Context-Aware Text Generation

arXiv:2110.10774v1664 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the need for better text generation in scientific papers by providing a large-scale dataset, but it is incremental as it builds on existing text generation tasks with a focus on context.

The authors tackled the problem of generating scientific text that requires external context by introducing a new task of context-aware text generation and creating the SciXGen dataset with 205,304 annotated papers. They benchmarked state-of-the-art methods on this dataset to evaluate its efficacy in generating descriptions and paragraphs.

Generating texts in scientific papers requires not only capturing the content contained within the given input but also frequently acquiring the external information called \textit{context}. We push forward the scientific text generation by proposing a new task, namely \textbf{context-aware text generation} in the scientific domain, aiming at exploiting the contributions of context in generated texts. To this end, we present a novel challenging large-scale \textbf{Sci}entific Paper Dataset for Conte\textbf{X}t-Aware Text \textbf{Gen}eration (SciXGen), consisting of well-annotated 205,304 papers with full references to widely-used objects (e.g., tables, figures, algorithms) in a paper. We comprehensively benchmark, using state-of-the-arts, the efficacy of our newly constructed SciXGen dataset in generating description and paragraph. Our dataset and benchmarks will be made publicly available to hopefully facilitate the scientific text generation research.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes