A Machine-learning Based Ensemble Method For Anti-patterns Detection
This addresses the issue of low accuracy and high false positives in anti-pattern detection for software developers, though it is incremental as it builds on prior machine-learning approaches.
The paper tackles the problem of detecting anti-patterns in source code by proposing SMAD, a machine-learning ensemble method that aggregates existing detection tools, showing it improves aggregated tools and outperforms other ensemble methods in experiments on eight Java projects.
Anti-patterns are poor solutions to recurring design problems. Several empirical studies have highlighted their negative impact on program comprehension, maintainability, as well as fault-proneness. A variety of detection approaches have been proposed to identify their occurrences in source code. However, these approaches can identify only a subset of the occurrences and report large numbers of false positives and misses. Furthermore, a low agreement is generally observed among different approaches. Recent studies have shown the potential of machine-learning models to improve this situation. However, such algorithms require large sets of manually-produced training-data, which often limits their application in practice. In this paper, we present SMAD (SMart Aggregation of Anti-patterns Detectors), a machine-learning based ensemble method to aggregate various anti-patterns detection approaches on the basis of their internal detection rules. Thus, our method uses several detection tools to produce an improved prediction from a reasonable number of training examples. We implemented SMAD for the detection of two well known anti-patterns: God Class and Feature Envy. With the results of our experiments conducted on eight java projects, we show that: (1) our method clearly improves the so aggregated tools; (2) SMAD significantly outperforms other ensemble methods.