HCCLNCFeb 5, 2025

ScholaWrite: A Dataset of End-to-End Scholarly Writing Process

arXiv:2502.02904v49 citationsh-index: 22
Originality Incremental advance
AI Analysis

This dataset addresses the need for realistic writing data to develop better writing assistants for scientists, though it is incremental as it focuses on a specific domain.

The authors tackled the problem of understanding the cognitive process behind scholarly writing by creating ScholaWrite, the first dataset of end-to-end scholarly writing, which includes nearly 62K text changes from five computer science preprints over four months.

Writing is a cognitively demanding activity that requires constant decision-making, heavy reliance on working memory, and frequent shifts between tasks of different goals. To build writing assistants that truly align with writers' cognition, we must capture and decode the complete thought process behind how writers transform ideas into final texts. We present ScholaWrite, the first dataset of end-to-end scholarly writing, tracing the multi-month journey from initial drafts to final manuscripts. We contribute three key advances: (1) a Chrome extension that unobtrusively records keystrokes on Overleaf, enabling the collection of realistic, in-situ writing data; (2) a novel corpus of full scholarly manuscripts, enriched with fine-grained annotations of cognitive writing intentions. The dataset includes \LaTeX-based edits from five computer science preprints, capturing nearly 62K text changes over four months; and (3) analyses and insights into the micro-dynamics of scholarly writing, highlighting gaps between human writing processes and the current capabilities of large language models (LLMs) in providing meaningful assistance. ScholaWrite underscores the value of capturing end-to-end writing data to develop future writing assistants that support, not replace, the cognitive work of scientists.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes