SEAIJan 15, 2021

Old but Gold: Reconsidering the value of feedforward learners for software analytics

arXiv:2101.06319v24 citationsHas Code
AI Analysis

This work addresses the efficiency and effectiveness of machine learning methods for software engineers, showing that simpler models can outperform complex deep learning approaches, which is incremental but impactful for the domain.

The paper tackles the problem of software analytics tasks by applying simple feedforward neural networks to issue close time prediction and vulnerability detection, achieving new state-of-the-art results on three out of five datasets and matching it on a fourth, with training times reduced from hours to seconds.

There has been an increased interest in the use of deep learning approaches for software analytics tasks. State-of-the-art techniques leverage modern deep learning techniques such as LSTMs, yielding competitive performance, albeit at the price of longer training times. Recently, Galke and Scherp [18] showed that at least for image recognition, a decades-old feedforward neural network can match the performance of modern deep learning techniques. This motivated us to try the same in the SE literature. Specifically, in this paper, we apply feedforward networks with some preprocessing to two analytics tasks: issue close time prediction, and vulnerability detection. We test the hypothesis laid by Galke and Scherp [18], that feedforward networks suffice for many analytics tasks (which we call, the "Old but Gold" hypothesis) for these two tasks. For three out of five datasets from these tasks, we achieve new high-water mark results (that out-perform the prior state-of-the-art results) and for a fourth data set, Old but Gold performed as well as the recent state of the art. Furthermore, the old but gold results were obtained orders of magnitude faster than prior work. For example, for issue close time, old but gold found good predictors in 90 seconds (as opposed to the newer methods, which took 6 hours to run). Our results supports the "Old but Gold" hypothesis and leads to the following recommendation: try simpler alternatives before more complex methods. At the very least, this will produce a baseline result against which researchers can compare some other, supposedly more sophisticated, approach. And in the best case, they will obtain useful results that are as good as anything else, in a small fraction of the effort. To support open science, all our scripts and data are available on-line at https://github.com/fastidiouschipmunk/simple.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes