SEJan 17, 2019

Mining Treatment-Outcome Constructs from Sequential Software Engineering Data

arXiv:1901.05604v119 citations
Originality Incremental advance
AI Analysis

This provides a method for software engineers and researchers to automatically analyze sequential data for empirical insights, though it appears incremental as it builds on existing analytical techniques.

The paper tackles the problem of analyzing sequences of events in software engineering to identify significant treatment-outcome relationships, proposing the Gandhi-Washington Method (GWM) that uses regular expressions and statistical tests to automatically mine such constructs from data like file editing and release cycles.

Many investigations in empirical software engineering look at sequences of data resulting from development or management processes. In this paper, we propose an analytical approach called the Gandhi-Washington Method (GWM) to investigate the impact of recurring events in software projects. GWM takes an encoding of events and activities provided by a software analyst as input. It uses regular expressions to automatically condense and summarize information and infer treatments. Relating the treatments to the outcome through statistical tests, treatment-outcome constructs are automatically mined from the data. The output of GWM is a set of treatment-outcome constructs. Each treatment in the set of mined constructs is significantly different from the other treatments considering the impact on the outcome and/or is structurally different from other treatments considering the sequence of events. We describe GWM and classes of problems to which GWM can be applied. We demonstrate the applicability of this method for empirical studies on sequences of file editing, code ownership, and release cycle time.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes