SEOct 1, 2013

Towards Base Rates in Software Analytics

arXiv:1310.0242v18 citationsHas Code
Originality Synthesis-oriented
AI Analysis

This work tackles the problem of impaired research and decision-making in software analytics due to missing base rates, though it is incremental as it provides initial results.

The paper addresses the lack of quantitative base rates for basic properties like code size and growth in software analytics by analyzing open source project data from Ohloh, presenting summary statistics for lines of code metrics.

Nowadays a vast and growing body of open source software (OSS) project data is publicly available on the internet. Despite this public body of project data, the field of software analytics has not yet settled on a solid quantitative base for basic properties such as code size, growth, team size, activity, and project failure. What is missing is a quantification of the base rates of such properties, where other fields (such as medicine) commonly rely on base rates for decision making and the evaluation of experimental results. The lack of knowledge in this area impairs both research activities in the field of software analytics and decision making on software projects in general. This paper contributes initial results of our research towards obtaining base rates using the data available at Ohloh (a large-scale index of OSS projects). Zooming in on the venerable 'lines of code' metric for code size and growth, we present and discuss summary statistics and identify further research challenges.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes