Improving Requirements Classification with SMOTE-Tomek Preprocessing
This addresses the problem of imbalanced data in requirements engineering for software developers, but it is incremental as it applies existing preprocessing techniques to a known dataset.
The study tackled class imbalance in requirements classification using the PROMISE dataset, achieving 76.16% accuracy with logistic regression, a significant improvement over the 58.31% baseline.
This study emphasizes the domain of requirements engineering by applying the SMOTE-Tomek preprocessing technique, combined with stratified K-fold cross-validation, to address class imbalance in the PROMISE dataset. This dataset comprises 969 categorized requirements, classified into functional and non-functional types. The proposed approach enhances the representation of minority classes while maintaining the integrity of validation folds, leading to a notable improvement in classification accuracy. Logistic regression achieved 76.16\%, significantly surpassing the baseline of 58.31\%. These results highlight the applicability and efficiency of machine learning models as scalable and interpretable solutions.