HC CLMar 16, 2025

CorpusStudio: Surfacing Emergent Patterns in a Corpus of Prior Work while Writing

Hai Dang, Chelse Swoopes, Daniel Buschek, Elena L. Glassman

Harvard

arXiv:2503.12436v114.68 citationsh-index: 33CHI

Originality Incremental advance

AI Analysis

This addresses the challenge for writers, such as scientists, in externalizing and applying community-specific writing norms, though it is incremental as it builds on existing retrieval and highlighting techniques.

The authors tackled the problem of helping writers understand and apply implicit writing norms by proposing two new writing support concepts that surface document and sentence-level patterns from a corpus, with a study (N=16) showing participants revised their writing and gained confidence in aligning with or breaking norms.

Many communities, including the scientific community, develop implicit writing norms. Understanding them is crucial for effective communication with that community. Writers gradually develop an implicit understanding of norms by reading papers and receiving feedback on their writing. However, it is difficult to both externalize this knowledge and apply it to one's own writing. We propose two new writing support concepts that reify document and sentence-level patterns in a given text corpus: (1) an ordered distribution over section titles and (2) given the user's draft and cursor location, many retrieved contextually relevant sentences. Recurring words in the latter are algorithmically highlighted to help users see any emergent norms. Study results (N=16) show that participants revised the structure and content using these concepts, gaining confidence in aligning with or breaking norms after reviewing many examples. These results demonstrate the value of reifying distributions over other authors' writing choices during the writing process.

View on arXiv PDF

Similar