A Methodology of Guiding Web Content Mining and Knowledge Discovery in Evidence-based Software Engineering
This provides incremental guidance for software engineering researchers to incorporate web knowledge more systematically.
The paper tackles the lack of systematic guidelines for using web content in Evidence-Based Software Engineering by adapting Systematic Literature Review methodology to regulate web mining activities and enhance processes with automated components like text mining.
Systematic Literature Review (SLR) is a rigorous methodology applied for Evidence-Based Software Engineering (EBSE) that identify, assess and synthesize the relevant evidence for answering specific research questions. Benefiting from the booming online materials in the era of Web 2.0, the technical Web content starts acting as alternative sources for EBSE. Web knowledge has been investigated and derived from Web content mining and knowledge discovery techniques, however they are still significantly different from reviewing academic literature. Thus the direct adoption of Web knowledge in EBSE lacks of systematic guidelines. In this paper, we propose to make an SLR adaptation to bridge the aforementioned gap along two stages. Firstly, we follow the general logic and procedure of SLR to regulate Web mining activities. Secondly, we substitute and enhance particular SLR processes with Web-mining-friendly methods and approaches. At the second stage, we mainly focus on adapting Conducting Review by integrating a set of automated components ranging from programmatic searching to various text mining techniques.