SEJul 31, 2018

Sourcerer's Apprentice and the study of code snippet migration

arXiv:1808.00106v15 citations
Originality Synthesis-oriented
AI Analysis

This addresses a legal and practical issue for software developers and companies who reuse code from StackOverflow, but it is incremental as it applies an existing tool to new data.

The paper tackled the problem of improper software license migration in code snippets shared on StackOverflow, finding that many snippets from Python modules and documentation are relicensed to CC-BY-SA 3.0 without proper attribution, which violates original licenses and jeopardizes software built by companies and developers.

On the worldwide web, not only are webpages connected but source code is too. Software development is becoming more accessible to everyone and the licensing for software remains complicated. We need to know if software licenses are being maintained properly throughout their reuse and evolution. This motivated the development of the Sourcerer's Apprentice, a webservice that helps track clone relicensing, because software typically employ software licenses to describe how their software may be used and adapted. But most developers do not have the legal expertise to sort out license conflicts. In this paper we put the Apprentice to work on empirical studies that demonstrate there is much sharing between StackOverflow code and Python modules and Python documentation that violates the licensing of the original Python modules and documentation: software snippets shared through StackOverflow are often being relicensed improperly to CC-BY-SA 3.0 without maintaining the appropriate attribution. We show that many snippets on StackOverflow are inappropriately relicensed by StackOverflow users, jeopardizing the status of the software built by companies and developers who reuse StackOverflow snippets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes