SESep 8, 2018

SOTorrent: Studying the Origin, Evolution, and Usage of Stack Overflow Code Snippets

arXiv:1809.02814v269 citations
Originality Synthesis-oriented
AI Analysis

This addresses the need for researchers to track code evolution and usage in software development communities, though it is incremental as it builds on existing data.

The researchers tackled the problem of analyzing the evolution and usage of code snippets on Stack Overflow by creating SOTorrent, an open dataset that provides version history and cross-platform connections, resulting in a tool for studying code maintenance across platforms like GitHub.

Stack Overflow (SO) is the most popular question-and-answer website for software developers, providing a large amount of copyable code snippets. Like other software artifacts, code on SO evolves over time, for example when bugs are fixed or APIs are updated to the most recent version. To be able to analyze how code and the surrounding text on SO evolves, we built SOTorrent, an open dataset based on the official SO data dump. SOTorrent provides access to the version history of SO content at the level of whole posts and individual text and code blocks. It connects code snippets from SO posts to other platforms by aggregating URLs from surrounding text blocks and comments, and by collecting references from GitHub files to SO posts. Our vision is that researchers will use SOTorrent to investigate and understand the evolution and maintenance of code on SO and its relation to other platforms such as GitHub.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes