A Systematic Mapping Study of Empirical Studies performed with Collections of Software Projects
This addresses the problem of reduced representativeness and replication in software engineering experiments for researchers, but it is incremental as it maps existing practices without proposing new solutions.
The study tackled the lack of standardized strategies for selecting software projects in empirical software engineering research, finding that 72% of 122 studies used their own guidelines and only 27% used existing project collections, with no evidence of a standardized framework or statistical methods linking to selection strategies.
Context: software projects are common resources in Software Engineering experiments, although these are often selected without following a specific strategy, which reduces the representativeness and replication of the results. An option is the use of preserved collections of software projects, but these must be current, with explicit guidelines that guarantee their updating over a long period of time. Goal: to carry out a systematic secondary study about the strategies to select software projects in empirical studies to discover the guidelines taken into account, the degree of use of project collections, the meta-data extracted and the subsequent statistical analysis conducted. Method: A systematic mapping study to identify studies published from January 2013 to December 2020. Results: 122 studies were identified, of which the 72% used their own guidelines for project selection and the 27% used existent project collections. Likewise, there was no evidence of a standardized framework for the project selection process, nor the application of statistical methods that relates with the sample collection strategy.