A Literature Survey on Empirical Evidence in Software Engineering
This provides a quantitative basis for discussing the state of empirical research in software engineering, but it is incremental as it surveys existing practices without introducing new methods.
The paper tackled the problem of understanding the characteristics of corpora used in software engineering research to improve quality and reproducibility, finding that while most papers use corpora like open-source Java projects, there are no frequently used projects across all papers, with some recurrences detected in specific conferences.
Context: Software Engineering research makes use of collections of software artifacts (corpora) to derive empirical evidence from. Goal: To improve quality and reproducibility of research, we need to understand the characteristics of used corpora. Method: For that, we perform a literature survey using grounded theory. We analyze the latest proceedings of seven relevant conferences. Results: While almost all papers use corpora of some kind with the common case of collections of source code of open-source Java projects, there are no frequently used projects or corpora across all the papers. For some conferences we can detect recurrences. We discover several forms of requirements and applied tunings for corpora which indicate more specific needs of research efforts. Conclusion: Our survey feeds into a quantitative basis for discussing the current state of empirical research in software engineering, thereby enabling ultimately improvement of research quality specifically in terms of use (and reuse) of empirical evidence.