SEDec 25, 2018

A Variability-Aware Design Approach to the Data Analysis Modeling Process

arXiv:1812.10176v17 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of automating data analysis modeling for software engineers and data scientists, but it is incremental as it builds on existing methodologies like CRISP-DM.

The paper tackles the challenge of designing and automating the data analysis modeling phase in big data projects by proposing a variability-aware design approach that assesses variability in CRISP-DM, defines a framework, and evaluates automation possibilities, resulting in enhanced system flexibility and potential for improved automation.

The massive amount of current data has led to many different forms of data analysis processes that aim to explore this data to uncover valuable insights. Methodologies to guide the development of big data science projects, including CRISP-DM and SEMMA, have been widely used in industry and academia. The data analysis modeling phase, which involves decisions on the most appropriate models to adopt, is at the core of these projects. However, from a software engineering perspective, the design and automation of activities performed in this phase are challenging. In this paper, we propose an approach to the data analysis modeling process which involves (i) the assessment of the variability inherent in the CRISP-DM data analysis modeling phase and the provision of feature models that represent this variability; (ii) the definition of a framework structural design that captures the identified variability; and (iii) evaluation of the developed framework design in terms of the possibilities for process automation. The proposed approach advances the state of the art by offering a variability-aware design solution that can enhance system flexibility, potentially leading to novel software frameworks which can significantly improve the level of automation in data analysis modeling process.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes