On a Factorial Knowledge Architecture for Data Science-powered Software Engineering
This addresses the challenge of blindfold and meaningless data mining in software engineering for researchers and practitioners, though it appears incremental as it builds on existing knowledge frameworks like SWEBOK.
The paper tackles the problem of applying data science to software engineering by proposing a factor-based hierarchical knowledge architecture to guide mining software repositories, aiming to maximize their value and inspire future data-driven studies.
Given the data-intensive and collaborative trend in science, the software engineering community also pays increasing attention to obtaining valuable and useful insights from data repositories. Nevertheless, applying data science to software engineering (e.g., mining software repositories) can be blindfold and meaningless, if lacking a suitable knowledge architecture (KA). By observing that software engineering practices are generally recorded through a set of factors (e.g., programmer capacity, different environmental conditions, etc.) involved in various software project aspects, we propose a factor-based hierarchical KA of software engineering to help maximize the value of software repositories and inspire future software data-driven studies. In particular, it is the organized factors and their relationships that help guide software engineering knowledge mining, while the mined knowledge will in turn be indexed/managed through the relevant factors and their interactions. This paper explains our idea about the factorial KA and concisely demonstrates a KA component, i.e. the early-version KA of software product engineering. Once fully scoped, this proposed KA will supplement the well-known SWEBOK in terms of both the factor-centric knowledge management and the coverage/implication of potential software engineering knowledge.