SEAISYJan 11, 2025

Improving Requirements Classification with SMOTE-Tomek Preprocessing

arXiv:2501.06491v110 citationsh-index: 2
Originality Synthesis-oriented
AI Analysis

This addresses the problem of imbalanced data in requirements engineering for software developers, but it is incremental as it applies existing preprocessing techniques to a known dataset.

The study tackled class imbalance in requirements classification using the PROMISE dataset, achieving 76.16% accuracy with logistic regression, a significant improvement over the 58.31% baseline.

This study emphasizes the domain of requirements engineering by applying the SMOTE-Tomek preprocessing technique, combined with stratified K-fold cross-validation, to address class imbalance in the PROMISE dataset. This dataset comprises 969 categorized requirements, classified into functional and non-functional types. The proposed approach enhances the representation of minority classes while maintaining the integrity of validation folds, leading to a notable improvement in classification accuracy. Logistic regression achieved 76.16\%, significantly surpassing the baseline of 58.31\%. These results highlight the applicability and efficiency of machine learning models as scalable and interpretable solutions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes