CYSEJun 29, 2021

The penumbra of open source: projects outside of centralized platforms are longer maintained, more academic and more collaborative

arXiv:2106.15611v314 citationsHas Code
Originality Synthesis-oriented
AI Analysis

This research addresses the bias in studying open source collaboration for researchers and developers, highlighting that GitHub may not fully represent the broader ecosystem, though it is incremental in extending existing sampling methods.

The study tackled the problem of GitHub's representativeness for open source development by comparing projects on and off centralized platforms, finding that off-platform projects have more collaborators, longer maintenance periods, and a greater focus on academic issues.

GitHub has become the central online platform for much of open source, hosting most open source code repositories. With this popularity, the public digital traces of GitHub are now a valuable means to study teamwork and collaboration. In many ways, however, GitHub is a convenience sample, and may not be representative of open source development off the platform. Here we develop a novel, extensive sample of public open source project repositories outside of centralized platforms. We characterized these projects along a number of dimensions, and compare to a time-matched sample of corresponding GitHub projects. Our sample projects tend to have more collaborators, are maintained for longer periods, and tend to be more focused on academic and scientific problems.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes