Feature Maps: A Comprehensible Software Representation for Design Pattern Detection
This work addresses the challenge of automating design pattern detection for software developers, offering a tool to uncover hidden information in code, though it is incremental as it builds on existing machine learning methods.
The paper tackles the problem of detecting design patterns in source code by introducing Feature Maps, a software representation based on micro-structures, and shows that it yields robust classifiers even with imbalanced data, achieving strong performance in evaluations across four patterns.
Design patterns are elegant and well-tested solutions to recurrent software development problems. They are the result of software developers dealing with problems that frequently occur, solving them in the same or a slightly adapted way. A pattern's semantics provide the intent, motivation, and applicability, describing what it does, why it is needed, and where it is useful. Consequently, design patterns encode a well of information. Developers weave this information into their systems whenever they use design patterns to solve problems. This work presents Feature Maps, a flexible human- and machine-comprehensible software representation based on micro-structures. Our algorithm, the Feature-Role Normalization, presses the high-dimensional, inhomogeneous vector space of micro-structures into a feature map. We apply these concepts to the problem of detecting instances of design patterns in source code. We evaluate our methodology on four design patterns, a wide range of balanced and imbalanced labeled training data, and compare classical machine learning (Random Forests) with modern deep learning approaches (Convolutional Neural Networks). Feature maps yield robust classifiers even under challenging settings of strongly imbalanced data distributions without sacrificing human comprehensibility. Results suggest that feature maps are an excellent addition in the software analysis toolbox that can reveal useful information hidden in the source code.