Cross-project Classification of Security-related Requirements
This addresses the need for organizations to comply with standards like GDPR and HIPAA by automating security requirement identification, though it is incremental as it builds on existing classification methods.
The study tackled the problem of identifying security-related requirements in large specifications by training a classifier on heterogeneous online data, showing feasibility with performance improvements from data revision but noting that classifier accuracy is unaffected by writing style.
We investigate the feasibility of using a classifier for security-related requirements trained on requirement specifications available online. This is helpful in case different requirement types are not differentiated in a large existing requirement specification. Our work is motivated by the need to identify security requirements for the creation of security assurance cases that become a necessity for many organizations with new and upcoming standards like GDPR and HiPAA. We base our investigation on ten requirement specifications, randomly selected from a Google Search and partially pre-labeled. To validate the model, we run 10-fold cross-validation on the data where each specification constitutes a group. Our results indicate the feasibility of training a model from a heterogeneous data set including specifications from multiple domains and in different styles. However, performance benefits from revising the pre-labeled data for consistency. Additionally, we show that classifiers trained only on a specific specification type fare worse and that the way requirements are written has no impact on classifier accuracy.