SENEApr 17, 2020

An Annotated Dataset of Stack Overflow Post Edits

arXiv:2004.08193v27 citations
AI Analysis

This work provides a new dataset for software engineering researchers to analyze high-resolution edits, though it is incremental as it builds on existing mining approaches.

The authors tackled the problem of limited resolution in mining software repositories by creating an annotated dataset of over 7 million code and text edits from Stack Overflow, with preliminary results suggesting it could be valuable for extracting fine-grained patches like non-functional property optimizations.

To improve software engineering, software repositories have been mined for code snippets and bug fixes. Typically, this mining takes place at the level of files or commits. To be able to dig deeper and to extract insights at a higher resolution, we hereby present an annotated dataset that contains over 7 million edits of code and text on Stack Overflow. Our preliminary study indicates that these edits might be a treasure trove for mining information about fine-grained patches, e.g., for the optimisation of non-functional properties.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes