SELGNov 6, 2019

Methods for Stabilizing Models across Large Samples of Projects (with case studies on Predicting Defect and Project Health)

arXiv:1911.04250v43 citationsHas Code
Originality Highly original
AI Analysis

This addresses the lack of widely accepted, stable quantitative models in software engineering, offering a scalable solution for defect and project health prediction across hundreds of projects.

The paper tackles the problem of generating stable models for software quality predictions across large samples of projects, using a new transfer learning framework called STABILIZER, which finds minimal models (e.g., one for defect prediction on 756 projects) that perform as well or better than prior state-of-the-art methods.

Despite decades of research, SE lacks widely accepted models (that offer precise quantitative stable predictions) about what factors most influence software quality. This paper provides a promising result showing such stable models can be generated using a new transfer learning framework called "STABILIZER". Given a tree of recursively clustered projects (using project meta-data), STABILIZER promotes a model upwards if it performs best in the lower clusters (stopping when the promoted model performs worse than the models seen at a lower level). The number of models found by STABILIZER is minimal: one for defect prediction (756 projects) and less than a dozen for project health (1628 projects). Hence, via STABILIZER, it is possible to find a few projects which can be used for transfer learning and make conclusions that hold across hundreds of projects at a time. Further, the models produced in this manner offer predictions that perform as well or better than the prior state-of-the-art. To the best of our knowledge, STABILIZER is order of magnitude faster than the prior state-of-the-art transfer learners which seek to find conclusion stability, and these case studies are the largest demonstration of the generalizability of quantitative predictions of project quality yet reported in the SE literature. In order to support open science, all our scripts and data are online at https://github.com/Anonymous633671/STABILIZER.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes