M. Ali Babar

SE
h-index13
53papers
2,478citations
Novelty31%
AI Score48

53 Papers

SEMar 16, 2022
On the Use of Fine-grained Vulnerable Code Statements for Software Vulnerability Assessment Models

Triet H. M. Le, M. Ali Babar

Many studies have developed Machine Learning (ML) approaches to detect Software Vulnerabilities (SVs) in functions and fine-grained code statements that cause such SVs. However, there is little work on leveraging such detection outputs for data-driven SV assessment to give information about exploitability, impact, and severity of SVs. The information is important to understand SVs and prioritize their fixing. Using large-scale data from 1,782 functions of 429 SVs in 200 real-world projects, we investigate ML models for automating function-level SV assessment tasks, i.e., predicting seven Common Vulnerability Scoring System (CVSS) metrics. We particularly study the value and use of vulnerable statements as inputs for developing the assessment models because SVs in functions are originated in these statements. We show that vulnerable statements are 5.8 times smaller in size, yet exhibit 7.5-114.5% stronger assessment performance (Matthews Correlation Coefficient (MCC)) than non-vulnerable statements. Incorporating context of vulnerable statements further increases the performance by up to 8.9% (0.64 MCC and 0.75 F1-Score). Overall, we provide the initial yet promising ML-based baselines for function-level SV assessment, paving the way for further research in this direction.

SEJul 15, 2024
Mitigating Data Imbalance for Software Vulnerability Assessment: Does Data Augmentation Help?

Triet H. M. Le, M. Ali Babar

Background: Software Vulnerability (SV) assessment is increasingly adopted to address the ever-increasing volume and complexity of SVs. Data-driven approaches have been widely used to automate SV assessment tasks, particularly the prediction of the Common Vulnerability Scoring System (CVSS) metrics such as exploitability, impact, and severity. SV assessment suffers from the imbalanced distributions of the CVSS classes, but such data imbalance has been hardly understood and addressed in the literature. Aims: We conduct a large-scale study to quantify the impacts of data imbalance and mitigate the issue for SV assessment through the use of data augmentation. Method: We leverage nine data augmentation techniques to balance the class distributions of the CVSS metrics. We then compare the performance of SV assessment models with and without leveraging the augmented data. Results: Through extensive experiments on 180k+ real-world SVs, we show that mitigating data imbalance can significantly improve the predictive performance of models for all the CVSS tasks, by up to 31.8% in Matthews Correlation Coefficient. We also discover that simple text augmentation like combining random text insertion, deletion, and replacement can outperform the baseline across the board. Conclusions: Our study provides the motivation and the first promising step toward tackling data imbalance for effective SV assessment.

SEJul 25, 2024
Automatic Data Labeling for Software Vulnerability Prediction Models: How Far Are We?

Triet H. M. Le, M. Ali Babar

Background: Software Vulnerability (SV) prediction needs large-sized and high-quality data to perform well. Current SV datasets mostly require expensive labeling efforts by experts (human-labeled) and thus are limited in size. Meanwhile, there are growing efforts in automatic SV labeling at scale. However, the fitness of auto-labeled data for SV prediction is still largely unknown. Aims: We quantitatively and qualitatively study the quality and use of the state-of-the-art auto-labeled SV data, D2A, for SV prediction. Method: Using multiple sources and manual validation, we curate clean SV data from human-labeled SV-fixing commits in two well-known projects for investigating the auto-labeled counterparts. Results: We discover that 50+% of the auto-labeled SVs are noisy (incorrectly labeled), and they hardly overlap with the publicly reported ones. Yet, SV prediction models utilizing the noisy auto-labeled SVs can perform up to 22% and 90% better in Matthews Correlation Coefficient and Recall, respectively, than the original models. We also reveal the promises and difficulties of applying noise-reduction methods for automatically addressing the noise in auto-labeled SV data to maximize the data utilization for SV prediction. Conclusions: Our study informs the benefits and challenges of using auto-labeled SVs, paving the way for large-scale SV prediction.

SEAug 1, 2024
A Qualitative Study on Using ChatGPT for Software Security: Perception vs. Practicality

M. Mehdi Kholoosi, M. Ali Babar, Roland Croft

Artificial Intelligence (AI) advancements have enabled the development of Large Language Models (LLMs) that can perform a variety of tasks with remarkable semantic understanding and accuracy. ChatGPT is one such LLM that has gained significant attention due to its impressive capabilities for assisting in various knowledge-intensive tasks. Due to the knowledge-intensive nature of engineering secure software, ChatGPT's assistance is expected to be explored for security-related tasks during the development/evolution of software. To gain an understanding of the potential of ChatGPT as an emerging technology for supporting software security, we adopted a two-fold approach. Initially, we performed an empirical study to analyse the perceptions of those who had explored the use of ChatGPT for security tasks and shared their views on Twitter. It was determined that security practitioners view ChatGPT as beneficial for various software security tasks, including vulnerability detection, information retrieval, and penetration testing. Secondly, we designed an experiment aimed at investigating the practicality of this technology when deployed as an oracle in real-world settings. In particular, we focused on vulnerability detection and qualitatively examined ChatGPT outputs for given prompts within this prominent software security task. Based on our analysis, responses from ChatGPT in this task are largely filled with generic security information and may not be appropriate for industry use. To prevent data leakage, we performed this analysis on a vulnerability dataset compiled after the OpenAI data cut-off date from real-world projects covering 40 distinct vulnerability types and 12 programming languages. We assert that the findings from this study would contribute to future research aimed at developing and evaluating LLMs dedicated to software security.

SEJul 24, 2024
Automated Code-centric Software Vulnerability Assessment: How Far Are We? An Empirical Study in C/C++

Anh The Nguyen, Triet Huynh Minh Le, M. Ali Babar

Background: The C and C++ languages hold significant importance in Software Engineering research because of their widespread use in practice. Numerous studies have utilized Machine Learning (ML) and Deep Learning (DL) techniques to detect software vulnerabilities (SVs) in the source code written in these languages. However, the application of these techniques in function-level SV assessment has been largely unexplored. SV assessment is increasingly crucial as it provides detailed information on the exploitability, impacts, and severity of security defects, thereby aiding in their prioritization and remediation. Aims: We conduct the first empirical study to investigate and compare the performance of ML and DL models, many of which have been used for SV detection, for function-level SV assessment in C/C++. Method: Using 9,993 vulnerable C/C++ functions, we evaluated the performance of six multi-class ML models and five multi-class DL models for the SV assessment at the function level based on the Common Vulnerability Scoring System (CVSS). We further explore multi-task learning, which can leverage common vulnerable code to predict all SV assessment outputs simultaneously in a single model, and compare the effectiveness and efficiency of this model type with those of the original multi-class models. Results: We show that ML has matching or even better performance compared to the multi-class DL models for function-level SV assessment with significantly less training time. Employing multi-task learning allows the DL models to perform significantly better, with an average of 8-22% increase in Matthews Correlation Coefficient (MCC). Conclusions: We distill the practices of using data-driven techniques for function-level SV assessment in C/C++, including the use of multi-task DL to balance efficiency and effectiveness. This can establish a strong foundation for future work in this area.

CLJul 3, 2023
Alert-ME: An Explainability-Driven Defense Against Adversarial Examples in Transformer-Based Text Classification

Bushra Sabir, Yansong Gao, Alsharif Abuadbba et al.

Transformer-based text classifiers such as BERT, RoBERTa, T5, and GPT have shown strong performance in natural language processing tasks but remain vulnerable to adversarial examples. These vulnerabilities raise significant security concerns, as small input perturbations can cause severe misclassifications. Existing robustness methods often require heavy computation or lack interpretability. This paper presents a unified framework called Explainability-driven Detection, Identification, and Transformation (EDIT) to strengthen inference-time defenses. EDIT integrates explainability tools, including attention maps and integrated gradients, with frequency-based features to automatically detect and identify adversarial perturbations while offering insight into model behavior. After detection, EDIT refines adversarial inputs using an optimal transformation process that leverages pre-trained embeddings and model feedback to replace corrupted tokens. To enhance security assurance, EDIT incorporates automated alerting mechanisms that involve human analysts when necessary. Beyond static defenses, EDIT also provides adaptive resilience by enforcing internal feature similarity and transforming inputs, thereby disrupting the attackers optimization process and limiting the effectiveness of adaptive adversarial attacks. Experiments using BERT and RoBERTa on IMDB, YELP, AGNEWS, and SST2 datasets against seven word substitution attacks demonstrate that EDIT achieves an average Fscore of 89.69 percent and balanced accuracy of 89.70 percent. Compared to four state-of-the-art defenses, EDIT improves balanced accuracy by 1.22 times and F1-score by 1.33 times while being 83 times faster in feature extraction. The framework provides robust, interpretable, and efficient protection against both standard, zero-day, and adaptive adversarial threats in text classification models.

LGMar 30
ORACAL: A Robust and Explainable Multimodal Framework for Smart Contract Vulnerability Detection with Causal Graph Enrichment

Tran Duong Minh Dai, Triet Huynh Minh Le, M. Ali Babar et al.

Although Graph Neural Networks (GNNs) have shown promise for smart contract vulnerability detection, they still face significant limitations. Homogeneous graph models fail to capture the interplay between control flow and data dependencies, while heterogeneous graph approaches often lack deep semantic understanding, leaving them susceptible to adversarial attacks. Moreover, most black-box models fail to provide explainable evidence, hindering trust in professional audits. To address these challenges, we propose ORACAL (Observable RAG-enhanced Analysis with CausAL reasoning), a heterogeneous multimodal graph learning framework that integrates Control Flow Graph (CFG), Data Flow Graph (DFG), and Call Graph (CG). ORACAL selectively enriches critical subgraphs with expert-level security context from Retrieval-Augmented Generation (RAG) and Large Language Models (LLMs), and employs a causal attention mechanism to disentangle true vulnerability indicators from spurious correlations. For transparency, the framework adopts PGExplainer to generate subgraph-level explanations identifying vulnerability triggering paths. Experiments on large-scale datasets demonstrate that ORACAL achieves state-of-the-art performance, outperforming MANDO-HGT, MTVHunter, GNN-SC, and SCVHunter by up to 39.6 percentage points, with a peak Macro F1 of 91.28% on the primary benchmark. ORACAL maintains strong generalization on out-of-distribution datasets with 91.8% on CGT Weakness and 77.1% on DAppScan. In explainability evaluation, PGExplainer achieves 32.51% Mean Intersection over Union (MIoU) against manually annotated vulnerability triggering paths. Under adversarial attacks, ORACAL limits performance degradation to approximately 2.35% F1 decrease with an Attack Success Rate (ASR) of only 3%, surpassing SCVHunter and MANDO-HGT which exhibit ASRs ranging from 10.91% to 18.73%.

SEDec 29, 2025
Securing the AI Supply Chain: What Can We Learn From Developer-Reported Security Issues and Solutions of AI Projects?

The Anh Nguyen, Triet Huynh Minh Le, M. Ali Babar

The rapid growth of Artificial Intelligence (AI) models and applications has led to an increasingly complex security landscape. Developers of AI projects must contend not only with traditional software supply chain issues but also with novel, AI-specific security threats. However, little is known about what security issues are commonly encountered and how they are resolved in practice. This gap hinders the development of effective security measures for each component of the AI supply chain. We bridge this gap by conducting an empirical investigation of developer-reported issues and solutions, based on discussions from Hugging Face and GitHub. To identify security-related discussions, we develop a pipeline that combines keyword matching with an optimal fine-tuned distilBERT classifier, which achieved the best performance in our extensive comparison of various deep learning and large language models. This pipeline produces a dataset of 312,868 security discussions, providing insights into the security reporting practices of AI applications and projects. We conduct a thematic analysis of 753 posts sampled from our dataset and uncover a fine-grained taxonomy of 32 security issues and 24 solutions across four themes: (1) System and Software, (2) External Tools and Ecosystem, (3) Model, and (4) Data. We reveal that many security issues arise from the complex dependencies and black-box nature of AI components. Notably, challenges related to Models and Data often lack concrete solutions. Our insights can offer evidence-based guidance for developers and researchers to address real-world security threats across the AI supply chain.

CRDec 21, 2021Code
Well Begun is Half Done: An Empirical Study of Exploitability & Impact of Base-Image Vulnerabilities

Mubin Ul Haque, M. Ali Babar

Container technology, (e.g., Docker) is being widely adopted for deploying software infrastructures or applications in the form of container images. Security vulnerabilities in the container images are a primary concern for developing containerized software. Exploitation of the vulnerabilities could result in disastrous impact, such as loss of confidentiality, integrity, and availability of containerized software. Understanding the exploitability and impact characteristics of vulnerabilities can help in securing the configuration of containerized software. However, there is a lack of research aimed at empirically identifying and understanding the exploitability and impact of vulnerabilities in container images. We carried out an empirical study to investigate the exploitability and impact of security vulnerabilities in base-images and their prevalence in open-source containerized software. We considered base-images since container images are built from base-images that provide all the core functionalities to build and operate containerized software. We discovered and characterized the exploitability and impact of security vulnerabilities in 261 base-images, which are the origin of 4,681 actively maintained official container images in the largest container registry, i.e., Docker Hub. To characterize the prevalence of vulnerable base-images in real-world projects, we analysed 64,579 containerized software from GitHub. Our analysis of a set of $1,983$ unique base-image security vulnerabilities revealed 13 novel findings. These findings are expected to help developers to understand the potential security problems related to base-images and encourage them to investigate base-images from security perspective before developing their applications.

SESep 19, 2017Code
Understanding the Heterogeneity of Contributors in Bug Bounty Programs

Hideaki Hata, Mingyu Guo, M. Ali Babar

Background: While bug bounty programs are not new in software development, an increasing number of companies, as well as open source projects, rely on external parties to perform the security assessment of their software for reward. However, there is relatively little empirical knowledge about the characteristics of bug bounty program contributors. Aim: This paper aims to understand those contributors by highlighting the heterogeneity among them. Method: We analyzed the histories of 82 bug bounty programs and 2,504 distinct bug bounty contributors, and conducted a quantitative and qualitative survey. Results: We found that there are project-specific and non-specific contributors who have different motivations for contributing to the products and organizations. Conclusions: Our findings provide insights to make bug bounty programs better and for further studies of new software development roles.

SEDec 11, 2023
METAL: Metamorphic Testing Framework for Analyzing Large-Language Model Qualities

Sangwon Hyun, Mingyu Guo, M. Ali Babar

Large-Language Models (LLMs) have shifted the paradigm of natural language data processing. However, their black-boxed and probabilistic characteristics can lead to potential risks in the quality of outputs in diverse LLM applications. Recent studies have tested Quality Attributes (QAs), such as robustness or fairness, of LLMs by generating adversarial input texts. However, existing studies have limited their coverage of QAs and tasks in LLMs and are difficult to extend. Additionally, these studies have only used one evaluation metric, Attack Success Rate (ASR), to assess the effectiveness of their approaches. We propose a MEtamorphic Testing for Analyzing LLMs (METAL) framework to address these issues by applying Metamorphic Testing (MT) techniques. This approach facilitates the systematic testing of LLM qualities by defining Metamorphic Relations (MRs), which serve as modularized evaluation metrics. The METAL framework can automatically generate hundreds of MRs from templates that cover various QAs and tasks. In addition, we introduced novel metrics that integrate the ASR method into the semantic qualities of text to assess the effectiveness of MRs accurately. Through the experiments conducted with three prominent LLMs, we have confirmed that the METAL framework effectively evaluates essential QAs on primary LLM tasks and reveals the quality risks in LLMs. Moreover, the newly proposed metrics can guide the optimal MRs for testing each task and suggest the most effective method for generating MRs.

SEApr 26, 2024
Software Vulnerability Prediction in Low-Resource Languages: An Empirical Study of CodeBERT and ChatGPT

Triet H. M. Le, M. Ali Babar, Tung Hoang Thai

Background: Software Vulnerability (SV) prediction in emerging languages is increasingly important to ensure software security in modern systems. However, these languages usually have limited SV data for developing high-performing prediction models. Aims: We conduct an empirical study to evaluate the impact of SV data scarcity in emerging languages on the state-of-the-art SV prediction model and investigate potential solutions to enhance the performance. Method: We train and test the state-of-the-art model based on CodeBERT with and without data sampling techniques for function-level and line-level SV prediction in three low-resource languages - Kotlin, Swift, and Rust. We also assess the effectiveness of ChatGPT for low-resource SV prediction given its recent success in other domains. Results: Compared to the original work in C/C++ with large data, CodeBERT's performance of function-level and line-level SV prediction significantly declines in low-resource languages, signifying the negative impact of data scarcity. Regarding remediation, data sampling techniques fail to improve CodeBERT; whereas, ChatGPT showcases promising results, substantially enhancing predictive performance by up to 34.4% for the function level and up to 53.5% for the line level. Conclusion: We have highlighted the challenge and made the first promising step for low-resource SV prediction, paving the way for future research in this direction.

SEFeb 4, 2025
LLMSecConfig: An LLM-Based Approach for Fixing Software Container Misconfigurations

Ziyang Ye, Triet Huynh Minh Le, M. Ali Babar

Security misconfigurations in Container Orchestrators (COs) can pose serious threats to software systems. While Static Analysis Tools (SATs) can effectively detect these security vulnerabilities, the industry currently lacks automated solutions capable of fixing these misconfigurations. The emergence of Large Language Models (LLMs), with their proven capabilities in code understanding and generation, presents an opportunity to address this limitation. This study introduces LLMSecConfig, an innovative framework that bridges this gap by combining SATs with LLMs. Our approach leverages advanced prompting techniques and Retrieval-Augmented Generation (RAG) to automatically repair security misconfigurations while preserving operational functionality. Evaluation of 1,000 real-world Kubernetes configurations achieved a 94\% success rate while maintaining a low rate of introducing new misconfigurations. Our work makes a promising step towards automated container security management, reducing the manual effort required for configuration maintenance.

SEApr 6, 2025
DDPT: Diffusion-Driven Prompt Tuning for Large Language Model Code Generation

Jinyang Li, Sangwon Hyun, M. Ali Babar

Large Language Models (LLMs) have demonstrated remarkable capabilities in code generation. However, the quality of the generated code is heavily dependent on the structure and composition of the prompts used. Crafting high-quality prompts is a challenging task that requires significant knowledge and skills of prompt engineering. To advance the automation support for the prompt engineering for LLM-based code generation, we propose a novel solution Diffusion-Driven Prompt Tuning (DDPT) that learns how to generate optimal prompt embedding from Gaussian Noise to automate the prompt engineering for code generation. We evaluate the feasibility of diffusion-based optimization and abstract the optimal prompt embedding as a directional vector toward the optimal embedding. We use the code generation loss given by the LLMs to help the diffusion model capture the distribution of optimal prompt embedding during training. The trained diffusion model can build a path from the noise distribution to the optimal distribution at the sampling phrase, the evaluation result demonstrates that DDPT helps improve the prompt optimization for code generation.

SEDec 9, 2024
MVD: A Multi-Lingual Software Vulnerability Detection Framework

Boyu Zhang, Triet H. M. Le, M. Ali Babar

Software vulnerabilities can result in catastrophic cyberattacks that increasingly threaten business operations. Consequently, ensuring the safety of software systems has become a paramount concern for both private and public sectors. Recent literature has witnessed increasing exploration of learning-based approaches for software vulnerability detection. However, a key limitation of these techniques is their primary focus on a single programming language, such as C/C++, which poses constraints considering the polyglot nature of modern software projects. Further, there appears to be an oversight in harnessing the synergies of vulnerability knowledge across varied languages, potentially underutilizing the full capabilities of these methods. To address the aforementioned issues, we introduce MVD - an innovative multi-lingual vulnerability detection framework. This framework acquires the ability to detect vulnerabilities across multiple languages by concurrently learning from vulnerability data of various languages, which are curated by our specialized pipeline. We also incorporate incremental learning to enable the detection capability of MVD to be extended to new languages, thus augmenting its practical utility. Extensive experiments on our curated dataset of more than 11K real-world multi-lingual vulnerabilities substantiate that our framework significantly surpasses state-of-the-art methods in multi-lingual vulnerability detection by 83.7% to 193.6% in PR-AUC. The results also demonstrate that MVD detects vulnerabilities well for new languages without compromising the detection performance of previously trained languages, even when training data for the older languages is unavailable. Overall, our findings motivate and pave the way for the prediction of multi-lingual vulnerabilities in modern software systems.

SEAug 6, 2025
Experimental Analysis of Productive Interaction Strategy with ChatGPT: User Study on Function and Project-level Code Generation Tasks

Sangwon Hyun, Hyunjun Kim, Jinhyuk Jang et al.

The application of Large Language Models (LLMs) is growing in the productive completion of Software Engineering tasks. Yet, studies investigating the productive prompting techniques often employed a limited problem space, primarily focusing on well-known prompting patterns and mainly targeting function-level SE practices. We identify significant gaps in real-world workflows that involve complexities beyond class-level (e.g., multi-class dependencies) and different features that can impact Human-LLM Interactions (HLIs) processes in code generation. To address these issues, we designed an experiment that comprehensively analyzed the HLI features regarding the code generation productivity. Our study presents two project-level benchmark tasks, extending beyond function-level evaluations. We conducted a user study with 36 participants from diverse backgrounds, asking them to solve the assigned tasks by interacting with the GPT assistant using specific prompting patterns. We also examined the participants' experience and their behavioral features during interactions by analyzing screen recordings and GPT chat logs. Our statistical and empirical investigation revealed (1) that three out of 15 HLI features significantly impacted the productivity in code generation; (2) five primary guidelines for enhancing productivity for HLI processes; and (3) a taxonomy of 29 runtime and logic errors that can occur during HLI processes, along with suggested mitigation plans.

SEJul 8, 2025
Search-based Selection of Metamorphic Relations for Optimized Robustness Testing of Large Language Models

Sangwon Hyun, Shaukat Ali, M. Ali Babar

Assessing the trustworthiness of Large Language Models (LLMs), such as robustness, has garnered significant attention. Recently, metamorphic testing that defines Metamorphic Relations (MRs) has been widely applied to evaluate the robustness of LLM executions. However, the MR-based robustness testing still requires a scalable number of MRs, thereby necessitating the optimization of selecting MRs. Most extant LLM testing studies are limited to automatically generating test cases (i.e., MRs) to enhance failure detection. Additionally, most studies only considered a limited test space of single perturbation MRs in their evaluation of LLMs. In contrast, our paper proposes a search-based approach for optimizing the MR groups to maximize failure detection and minimize the LLM execution cost. Moreover, our approach covers the combinatorial perturbations in MRs, facilitating the expansion of test space in the robustness assessment. We have developed a search process and implemented four search algorithms: Single-GA, NSGA-II, SPEA2, and MOEA/D with novel encoding to solve the MR selection problem in the LLM robustness testing. We conducted comparative experiments on the four search algorithms along with a random search, using two major LLMs with primary Text-to-Text tasks. Our statistical and empirical investigation revealed two key findings: (1) the MOEA/D algorithm performed the best in optimizing the MR space for LLM robustness testing, and (2) we identified silver bullet MRs for the LLM robustness testing, which demonstrated dominant capabilities in confusing LLMs across different Text-to-Text tasks. In LLM robustness assessment, our research sheds light on the fundamental problem for optimized testing and provides insights into search-based solutions.

SEJun 28, 2024
Systematic Literature Review on Application of Learning-based Approaches in Continuous Integration

Ali Kazemi Arani, Triet Huynh Minh Le, Mansooreh Zahedi et al.

Context: Machine learning (ML) and deep learning (DL) analyze raw data to extract valuable insights in specific phases. The rise of continuous practices in software projects emphasizes automating Continuous Integration (CI) with these learning-based methods, while the growing adoption of such approaches underscores the need for systematizing knowledge. Objective: Our objective is to comprehensively review and analyze existing literature concerning learning-based methods within the CI domain. We endeavour to identify and analyse various techniques documented in the literature, emphasizing the fundamental attributes of training phases within learning-based solutions in the context of CI. Method: We conducted a Systematic Literature Review (SLR) involving 52 primary studies. Through statistical and thematic analyses, we explored the correlations between CI tasks and the training phases of learning-based methodologies across the selected studies, encompassing a spectrum from data engineering techniques to evaluation metrics. Results: This paper presents an analysis of the automation of CI tasks utilizing learning-based methods. We identify and analyze nine types of data sources, four steps in data preparation, four feature types, nine subsets of data features, five approaches for hyperparameter selection and tuning, and fifteen evaluation metrics. Furthermore, we discuss the latest techniques employed, existing gaps in CI task automation, and the characteristics of the utilized learning-based techniques. Conclusion: This study provides a comprehensive overview of learning-based methods in CI, offering valuable insights for researchers and practitioners developing CI task automation. It also highlights the need for further research to advance these methods in CI.

SEJan 20, 2024
Are Latent Vulnerabilities Hidden Gems for Software Vulnerability Prediction? An Empirical Study

Triet H. M. Le, Xiaoning Du, M. Ali Babar

Collecting relevant and high-quality data is integral to the development of effective Software Vulnerability (SV) prediction models. Most of the current SV datasets rely on SV-fixing commits to extract vulnerable functions and lines. However, none of these datasets have considered latent SVs existing between the introduction and fix of the collected SVs. There is also little known about the usefulness of these latent SVs for SV prediction. To bridge these gaps, we conduct a large-scale study on the latent vulnerable functions in two commonly used SV datasets and their utilization for function-level and line-level SV predictions. Leveraging the state-of-the-art SZZ algorithm, we identify more than 100k latent vulnerable functions in the studied datasets. We find that these latent functions can increase the number of SVs by 4x on average and correct up to 5k mislabeled functions, yet they have a noise level of around 6%. Despite the noise, we show that the state-of-the-art SV prediction model can significantly benefit from such latent SVs. The improvements are up to 24.5% in the performance (F1-Score) of function-level SV predictions and up to 67% in the effectiveness of localizing vulnerable lines. Overall, our study presents the first promising step toward the use of latent SVs to improve the quality of SV datasets and enhance the performance of SV prediction tasks.

SEFeb 18, 2022
Why, How and Where of Delays in Software Security Patch Management: An Empirical Investigation in the Healthcare Sector

Nesara Dissanayake, Mansooreh Zahedi, Asangi Jayatilaka et al.

Numerous security attacks that resulted in devastating consequences can be traced back to a delay in applying a security patch. Despite the criticality of timely patch application, not much is known about why and how delays occur when applying security patches in practice, and how the delays can be mitigated. Based on longitudinal data collected from 132 delayed patching tasks over a period of four years and observations of patch meetings involving eight teams from two organisations in the healthcare domain, and using quantitative and qualitative data analysis approaches, we identify a set of reasons relating to technology, people and organisation as key explanations that cause delays in patching. Our findings also reveal that the most prominent cause of delays is attributable to coordination delays in the patch management process and a majority of delays occur during the patch deployment phase. Towards mitigating the delays, we describe a set of strategies employed by the studied practitioners. This research serves as the first step towards understanding the practical reasons for delays and possible mitigation strategies in vulnerability patch management. Our findings provide useful insights for practitioners to understand what and where improvement is needed in the patch management process and guide them towards taking timely actions against potential attacks. Also, our findings help researchers to invest effort into designing and developing computer-supported tools to better support a timely security patch management process.

CRJan 22, 2022
On the Privacy of Mental Health Apps: An Empirical Investigation and its Implications for Apps Development

Leonardo Horn Iwaya, M. Ali Babar, Awais Rashid et al.

An increasing number of mental health services are offered through mobile systems, a paradigm called mHealth. Although there is an unprecedented growth in the adoption of mHealth systems, partly due to the COVID-19 pandemic, concerns about data privacy risks due to security breaches are also increasing. Whilst some studies have analyzed mHealth apps from different angles, including security, there is relatively little evidence for data privacy issues that may exist in mHealth apps used for mental health services, whose recipients can be particularly vulnerable. This paper reports an empirical study aimed at systematically identifying and understanding data privacy incorporated in mental health apps. We analyzed 27 top-ranked mental health apps from Google Play Store. Our methodology enabled us to perform an in-depth privacy analysis of the apps, covering static and dynamic analysis, data sharing behaviour, server-side tests, privacy impact assessment requests, and privacy policy evaluation. Furthermore, we mapped the findings to the LINDDUN threat taxonomy, describing how threats manifest on the studied apps. The findings reveal important data privacy issues such as unnecessary permissions, insecure cryptography implementations, and leaks of personal data and credentials in logs and web requests. There is also a high risk of user profiling as the apps' development do not provide foolproof mechanisms against linkability, detectability and identifiability. Data sharing among third parties and advertisers in the current apps' ecosystem aggravates this situation. Based on the empirical findings of this study, we provide recommendations to be considered by different stakeholders of mHealth apps in general and apps developers in particular. [...]

CRJan 12, 2022
Security for Machine Learning-based Software Systems: a survey of threats, practices and challenges

Huaming Chen, M. Ali Babar

The rapid development of Machine Learning (ML) has demonstrated superior performance in many areas, such as computer vision, video and speech recognition. It has now been increasingly leveraged in software systems to automate the core tasks. However, how to securely develop the machine learning-based modern software systems (MLBSS) remains a big challenge, for which the insufficient consideration will largely limit its application in safety-critical domains. One concern is that the present MLBSS development tends to be rush, and the latent vulnerabilities and privacy issues exposed to external users and attackers will be largely neglected and hard to be identified. Additionally, machine learning-based software systems exhibit different liabilities towards novel vulnerabilities at different development stages from requirement analysis to system maintenance, due to its inherent limitations from the model and data and the external adversary capabilities. The successful generation of such intelligent systems will thus solicit dedicated efforts jointly from different research areas, i.e., software engineering, system security and machine learning. Most of the recent works regarding the security issues for ML have a strong focus on the data and models, which has brought adversarial attacks into consideration. In this work, we consider that security for machine learning-based software systems may arise from inherent system defects or external adversarial attacks, and the secure development practices should be taken throughout the whole lifecycle. While machine learning has become a new threat domain for existing software engineering practices, there is no such review work covering the topic. Overall, we present a holistic review regarding the security for MLBSS, which covers a systematic understanding from a structure review of three distinct aspects in terms of security threats...

CRDec 21, 2021
KGSecConfig: A Knowledge Graph Based Approach for Secured Container Orchestrator Configuration

Mubin Ul Haque, M. Mehdi Kholoosi, M. Ali Babar

Container Orchestrator (CO) is a vital technology for managing clusters of containers, which may form a virtualized infrastructure for developing and operating software systems. Like any other software system, securing CO is critical, but can be quite challenging task due to large number of configurable options. Manual configuration is not only knowledge intensive and time consuming, but also is error prone. For automating security configuration of CO, we propose a novel Knowledge Graph based Security Configuration, KGSecConfig, approach. Our solution leverages keyword and learning models to systematically capture, link, and correlate heterogeneous and multi-vendor configuration space in a unified structure for supporting automation of security configuration of CO. We implement KGSecConfig on Kubernetes, Docker, Azure, and VMWare to build secured configuration knowledge graph. Our evaluation results show 0.98 and 0.94 accuracy for keyword and learning-based secured configuration option and concept extraction, respectively. We also demonstrate the utilization of the knowledge graph for automated misconfiguration mitigation in a Kubernetes cluster. We assert that our knowledge graph based approach can help in addressing several challenges, e.g., misconfiguration of security, associated with manually configuring the security of CO.

SEDec 20, 2021
An Investigation into Inconsistency of Software Vulnerability Severity across Data Sources

Roland Croft, M. Ali Babar, Li Li

Software Vulnerability (SV) severity assessment is a vital task for informing SV remediation and triage. Ranking of SV severity scores is often used to advise prioritization of patching efforts. However, severity assessment is a difficult and subjective manual task that relies on expertise, knowledge, and standardized reporting schemes. Consequently, different data sources that perform independent analysis may provide conflicting severity rankings. Inconsistency across these data sources affects the reliability of severity assessment data, and can consequently impact SV prioritization and fixing. In this study, we investigate severity ranking inconsistencies over the SV reporting lifecycle. Our analysis helps characterize the nature of this problem, identify correlated factors, and determine the impacts of inconsistency on downstream tasks. Our findings observe that SV severity often lacks consideration or is underestimated during initial reporting, and such SVs consequently receive lower prioritization. We identify six potential attributes that are correlated to this misjudgment, and show that inconsistency in severity reporting schemes can severely degrade the performance of downstream severity prediction by up to 77%. Our findings help raise awareness of SV severity data inconsistencies and draw attention to this data quality problem. These insights can help developers better consider SV severity data sources, and improve the reliability of consequent SV prioritization. Furthermore, we encourage researchers to provide more attention to SV severity data selection.

CRDec 20, 2021
Systematic Literature Review on Cyber Situational Awareness Visualizations

Liuyue Jiang, Asangi Jayatilaka, Mehwish Nasim et al.

The dynamics of cyber threats are increasingly complex, making it more challenging than ever for organizations to obtain in-depth insights into their cyber security status. Therefore, organizations rely on Cyber Situational Awareness (CSA) to support them in better understanding the threats and associated impacts of cyber events. Due to the heterogeneity and complexity of cyber security data, often with multidimensional attributes, sophisticated visualization techniques are needed to achieve CSA. However, there have been no previous attempts to systematically review and analyze the scientific literature on CSA visualizations. In this paper, we systematically select and review 54 publications that discuss visualizations to support CSA. We extract data from these papers to identify key stakeholders, information types, data sources, and visualization techniques. Furthermore, we analyze the level of CSA supported by the visualizations, alongside examining the maturity of the visualizations, challenges, and practices related to CSA visualizations to prepare a full analysis of the current state of CSA in an organizational context. Our results reveal certain gaps in CSA visualizations. For instance, the largest focus is on operational-level staff, and there is a clear lack of visualizations targeting other types of stakeholders such as managers, higher-level decision makers, and non-expert users. Most papers focus on threat information visualization, and there is a dearth of papers that visualize impact information, response plans, and information shared within teams. Based on the results that highlight the important concerns in CSA visualizations, we recommend a list of future research directions.

CRDec 12, 2021
Evaluation of Security Training and Awareness Programs: Review of Current Practices and Guideline

Asangi Jayatilaka, Nathan Beu, Irina Baetu et al.

Evaluating the effectiveness of security awareness and training programs is critical for minimizing organizations' human security risk. Based on a literature review and industry interviews, we discuss current practices and devise guidelines for measuring the effectiveness of security training and awareness initiatives used by organizations

SESep 13, 2021
Data Preparation for Software Vulnerability Prediction: A Systematic Literature Review

Roland Croft, Yongzheng Xie, M. Ali Babar

Software Vulnerability Prediction (SVP) is a data-driven technique for software quality assurance that has recently gained considerable attention in the Software Engineering research community. However, the difficulties of preparing Software Vulnerability (SV) related data is considered as the main barrier to industrial adoption of SVP approaches. Given the increasing, but dispersed, literature on this topic, it is needed and timely to systematically select, review, and synthesize the relevant peer-reviewed papers reporting the existing SV data preparation techniques and challenges. We have carried out a Systematic Literature Review (SLR) of SVP research in order to develop a systematized body of knowledge of the data preparation challenges, solutions, and the needed research. Our review of the 61 relevant papers has enabled us to develop a taxonomy of data preparation for SVP related challenges. We have analyzed the identified challenges and available solutions using the proposed taxonomy. Our analysis of the state of the art has enabled us identify the opportunities for future research. This review also provides a set of recommendations for researchers and practitioners of SVP approaches.

CRSep 9, 2021
Automated Security Assessment for the Internet of Things

Xuanyu Duan, Mengmeng Ge, Triet H. M. Le et al.

Internet of Things (IoT) based applications face an increasing number of potential security risks, which need to be systematically assessed and addressed. Expert-based manual assessment of IoT security is a predominant approach, which is usually inefficient. To address this problem, we propose an automated security assessment framework for IoT networks. Our framework first leverages machine learning and natural language processing to analyze vulnerability descriptions for predicting vulnerability metrics. The predicted metrics are then input into a two-layered graphical security model, which consists of an attack graph at the upper layer to present the network connectivity and an attack tree for each node in the network at the bottom layer to depict the vulnerability information. This security model automatically assesses the security of the IoT network by capturing potential attack paths. We evaluate the viability of our approach using a proof-of-concept smart building system model which contains a variety of real-world IoT devices and potential vulnerabilities. Our evaluation of the proposed framework demonstrates its effectiveness in terms of automatically predicting the vulnerability metrics of new vulnerabilities with more than 90% accuracy, on average, and identifying the most vulnerable attack paths within an IoT network. The produced assessment results can serve as a guideline for cybersecurity professionals to take further actions and mitigate risks in a timely manner.

SEAug 18, 2021
DeepCVA: Automated Commit-level Vulnerability Assessment with Deep Multi-task Learning

Triet H. M. Le, David Hin, Roland Croft et al.

It is increasingly suggested to identify Software Vulnerabilities (SVs) in code commits to give early warnings about potential security risks. However, there is a lack of effort to assess vulnerability-contributing commits right after they are detected to provide timely information about the exploitability, impact and severity of SVs. Such information is important to plan and prioritize the mitigation for the identified SVs. We propose a novel Deep multi-task learning model, DeepCVA, to automate seven Commit-level Vulnerability Assessment tasks simultaneously based on Common Vulnerability Scoring System (CVSS) metrics. We conduct large-scale experiments on 1,229 vulnerability-contributing commits containing 542 different SVs in 246 real-world software projects to evaluate the effectiveness and efficiency of our model. We show that DeepCVA is the best-performing model with 38% to 59.8% higher Matthews Correlation Coefficient than many supervised and unsupervised baseline models. DeepCVA also requires 6.3 times less training and validation time than seven cumulative assessment models, leading to significantly less model maintenance cost as well. Overall, DeepCVA presents the first effective and efficient solution to automatically assess SVs early in software systems.

SEJul 29, 2021
An Empirical Study of Developers' Discussions about Security Challenges of Different Programming Languages

Roland Croft, Yongzheng Xie, Mansooreh Zahedi et al.

Given programming languages can provide different types and levels of security support, it is critically important to consider security aspects while selecting programming languages for developing software systems. Inadequate consideration of security in the choice of a programming language may lead to potential ramifications for secure development. Whilst theoretical analysis of the supposed security properties of different programming languages has been conducted, there has been relatively little effort to empirically explore the actual security challenges experienced by developers. We have performed a large-scale study of the security challenges of 15 programming languages by quantitatively and qualitatively analysing the developers' discussions from Stack Overflow and GitHub. By leveraging topic modelling, we have derived a taxonomy of 18 major security challenges for 6 topic categories. We have also conducted comparative analysis to understand how the identified challenges vary regarding the different programming languages and data sources. Our findings suggest that the challenges and their characteristics differ substantially for different programming languages and data sources, i.e., Stack Overflow and GitHub. The findings provide evidence-based insights and understanding of security challenges related to different programming languages to software professionals (i.e., practitioners or researchers). The reported taxonomy of security challenges can assist both practitioners and researchers in better understanding and traversing the secure development landscape. This study highlights the importance of the choice of technology, e.g., programming language, in secure software engineering. Hence, the findings are expected to motivate practitioners to consider the potential impact of the choice of programming languages on software security.

SEJul 18, 2021
A Survey on Data-driven Software Vulnerability Assessment and Prioritization

Triet H. M. Le, Huaming Chen, M. Ali Babar

Software Vulnerabilities (SVs) are increasing in complexity and scale, posing great security risks to many software systems. Given the limited resources in practice, SV assessment and prioritization help practitioners devise optimal SV mitigation plans based on various SV characteristics. The surges in SV data sources and data-driven techniques such as Machine Learning and Deep Learning have taken SV assessment and prioritization to the next level. Our survey provides a taxonomy of the past research efforts and highlights the best practices for data-driven SV assessment and prioritization. We also discuss the current limitations and propose potential solutions to address such issues.

SEJul 5, 2021
An Empirical Study of Rule-Based and Learning-Based Approaches for Static Application Security Testing

Roland Croft, Dominic Newlands, Ziyu Chen et al.

Background: Static Application Security Testing (SAST) tools purport to assist developers in detecting security issues in source code. These tools typically use rule-based approaches to scan source code for security vulnerabilities. However, due to the significant shortcomings of these tools (i.e., high false positive rates), learning-based approaches for Software Vulnerability Prediction (SVP) are becoming a popular approach. Aims: Despite the similar objectives of these two approaches, their comparative value is unexplored. We provide an empirical analysis of SAST tools and SVP models, to identify their relative capabilities for source code security analysis. Method: We evaluate the detection and assessment performance of several common SAST tools and SVP models on a variety of vulnerability datasets. We further assess the viability and potential benefits of combining the two approaches. Results: SAST tools and SVP models provide similar detection capabilities, but SVP models exhibit better overall performance for both detection and assessment. Unification of the two approaches is difficult due to lacking synergies. Conclusions: Our study generates 12 main findings which provide insights into the capabilities and synergy of these two approaches. Through these observations we provide recommendations for use and improvement.

CRApr 24, 2021
A Review on C3I Systems' Security: Vulnerabilities, Attacks, and Countermeasures

Hussain Ahmad, Isuru Dharmadasa, Faheem Ullah et al.

Command, Control, Communication, and Intelligence (C3I) systems are increasingly used in critical civil and military domains for achieving information superiority, operational efficacy, and greater situational awareness. Unlike traditional systems facing widespread cyber-attacks, the sensitive nature of C3I tactical operations make their cybersecurity a critical concern. For instance, tampering or intercepting confidential information in military battlefields not only damages C3I operations, but also causes irreversible consequences such as loss of human lives and mission failures. Therefore, C3I systems have become a focal point for cyber adversaries. Moreover, technological advancements and modernization of C3I systems have significantly increased the potential risk of cyber-attacks on C3I systems. Consequently, cyber adversaries leverage highly sophisticated attack vectors to exploit security vulnerabilities in C3I systems. Despite the burgeoning significance of cybersecurity for C3I systems, the existing literature lacks a comprehensive review to systematize the body of knowledge on C3I systems' security. Therefore, in this paper, we have gathered, analyzed, and synthesized the state-of-the-art on the cybersecurity of C3I systems. In particular, this paper has identified security vulnerabilities, attack vectors, and countermeasures/defenses for C3I systems. Furthermore, our survey has enabled us to: (i) propose a taxonomy for security vulnerabilities, attack vectors and countermeasures; (ii) interrelate attack vectors with security vulnerabilities and countermeasures; and (iii) propose future research directions for advancing the state-of-the-art on the cybersecurity of C3I systems.

SEMar 21, 2021
Automated Software Vulnerability Assessment with Concept Drift

Triet H. M. Le, Bushra Sabir, M. Ali Babar

Software Engineering researchers are increasingly using Natural Language Processing (NLP) techniques to automate Software Vulnerabilities (SVs) assessment using the descriptions in public repositories. However, the existing NLP-based approaches suffer from concept drift. This problem is caused by a lack of proper treatment of new (out-of-vocabulary) terms for the evaluation of unseen SVs over time. To perform automated SVs assessment with concept drift using SVs' descriptions, we propose a systematic approach that combines both character and word features. The proposed approach is used to predict seven Vulnerability Characteristics (VCs). The optimal model of each VC is selected using our customized time-based cross-validation method from a list of eight NLP representations and six well-known Machine Learning models. We have used the proposed approach to conduct large-scale experiments on more than 100,000 SVs in the National Vulnerability Database (NVD). The results show that our approach can effectively tackle the concept drift issue of the SVs' descriptions reported from 2000 to 2018 in NVD even without retraining the model. In addition, our approach performs competitively compared to the existing word-only method. We also investigate how to build compact concept-drift-aware models with much fewer features and give some recommendations on the choice of classifiers and NLP representations for SVs assessment.

SEMar 15, 2021
Challenges and solutions when adopting DevSecOps: A systematic review

Roshan N. Rajapakse, Mansooreh Zahedi, M. Ali Babar et al.

Context: DevOps has become one of the fastest-growing software development paradigms in the industry. However, this trend has presented the challenge of ensuring secure software delivery while maintaining the agility of DevOps. The efforts to integrate security in DevOps have resulted in the DevSecOps paradigm, which is gaining significant interest from both industry and academia. However, the adoption of DevSecOps in practice is proving to be a challenge. Objective: This study aims to systemize the knowledge about the challenges faced by practitioners when adopting DevSecOps and the proposed solutions reported in the literature. We also aim to identify the areas that need further research in the future. Method: We conducted a Systematic Literature Review of 54 peer-reviewed studies. The thematic analysis method was applied to analyze the extracted data. Results: We identified 21 challenges related to adopting DevSecOps, 31 specific solutions, and the mapping between these findings. We also determined key gap areas in this domain by holistically evaluating the available solutions against the challenges. The results of the study were classified into four themes: People, Practices, Tools, and Infrastructure. Our findings demonstrate that tool-related challenges and solutions were the most frequently reported, driven by the need for automation in this paradigm. Shift-left security and continuous security assessment were two key practices recommended for DevSecOps. Conclusions: We highlight the need for developer-centered application security testing tools that target the continuous practices in DevSecOps. More research is needed on how the traditionally manual security practices can be automated to suit rapid software deployment cycles. Finally, achieving a suitable balance between the speed of delivery and security is a significant issue practitioners face in the DevSecOps paradigm.

LGMar 11, 2021
ReinforceBug: A Framework to Generate Adversarial Textual Examples

Bushra Sabir, M. Ali Babar, Raj Gaire

Adversarial Examples (AEs) generated by perturbing original training examples are useful in improving the robustness of Deep Learning (DL) based models. Most prior works, generate AEs that are either unconscionable due to lexical errors or semantically or functionally deviant from original examples. In this paper, we present ReinforceBug, a reinforcement learning framework, that learns a policy that is transferable on unseen datasets and generates utility-preserving and transferable (on other models) AEs. Our results show that our method is on average 10% more successful as compared to the state-of-the-art attack TextFooler. Moreover, the target models have on average 73.64% confidence in the wrong prediction, the generated AEs preserve the functional equivalence and semantic similarity (83.38% ) to their original counterparts, and are transferable on other models with an average success rate of 46%.

CRJan 25, 2021
End-Users' Knowledge and Perception about Security of Mobile Health Apps: A Case Study with Two Saudi Arabian mHealth Providers

Bakheet Aljedaani, Aakash Ahmad, Mansooreh Zahedi et al.

Mobile health applications (mHealth apps for short) are being increasingly adopted in the healthcare sector, enabling stakeholders such as governments, health units, medics, and patients, to utilize health services in a pervasive manner. Despite having several known benefits, mHealth apps entail significant security and privacy challenges that can lead to data breaches with serious social, legal, and financial consequences. This research presents an empirical investigation about security awareness of end-users of mHealth apps that are available on major mobile platforms, including Android and iOS. We collaborated with two mHealth providers in Saudi Arabia to survey 101 end-users, investigating their security awareness about (i) existing and desired security features, (ii) security related issues, and (iii) methods to improve security knowledge. Findings indicate that majority of the end-users are aware of the existing security features provided by the apps (e.g., restricted app permissions); however, they desire usable security (e.g., biometric authentication) and are concerned about privacy of their health information (e.g., data anonymization). End-users suggested that protocols such as session timeout or Two-factor authentication (2FA) positively impact security but compromise usability of the app. Security-awareness via social media, peer guidance, or training from app providers can increase end-users trust in mHealth apps. This research investigates human-centric knowledge based on empirical evidence and provides a set of guidelines to develop secure and usable mHealth apps.

CRDec 17, 2020
Machine Learning for Detecting Data Exfiltration: A Review

Bushra Sabir, Faheem Ullah, M. Ali Babar et al.

Context: Research at the intersection of cybersecurity, Machine Learning (ML), and Software Engineering (SE) has recently taken significant steps in proposing countermeasures for detecting sophisticated data exfiltration attacks. It is important to systematically review and synthesize the ML-based data exfiltration countermeasures for building a body of knowledge on this important topic. Objective: This paper aims at systematically reviewing ML-based data exfiltration countermeasures to identify and classify ML approaches, feature engineering techniques, evaluation datasets, and performance metrics used for these countermeasures. This review also aims at identifying gaps in research on ML-based data exfiltration countermeasures. Method: We used a Systematic Literature Review (SLR) method to select and review {92} papers. Results: The review has enabled us to (a) classify the ML approaches used in the countermeasures into data-driven, and behaviour-driven approaches, (b) categorize features into six types: behavioural, content-based, statistical, syntactical, spatial and temporal, (c) classify the evaluation datasets into simulated, synthesized, and real datasets and (d) identify 11 performance measures used by these studies. Conclusion: We conclude that: (i) the integration of data-driven and behaviour-driven approaches should be explored; (ii) There is a need of developing high quality and large size evaluation datasets; (iii) Incremental ML model training should be incorporated in countermeasures; (iv) resilience to adversarial learning should be considered and explored during the development of countermeasures to avoid poisoning attacks; and (v) the use of automated feature engineering should be encouraged for efficiently detecting data exfiltration attacks.

SEDec 1, 2020
Software Security Patch Management -- A Systematic Literature Review of Challenges, Approaches, Tools and Practices

Nesara Dissanayake, Asangi Jayatilaka, Mansooreh Zahedi et al.

Context: Software security patch management purports to support the process of patching known software security vulnerabilities. Given the increasing recognition of the importance of software security patch management, it is important and timely to systematically review and synthesise the relevant literature on this topic. Objective: This paper aims at systematically reviewing the state of the art of software security patch management to identify the socio-technical challenges in this regard, reported solutions (i.e., approaches, tools, and practices), the rigour of the evaluation and the industrial relevance of the reported solutions, and to identify the gaps for future research. Method: We conducted a systematic literature review of 72 studies published from 2002 to March 2020, with extended coverage until September 2020 through forward snowballing. Results: We identify 14 socio-technical challenges, 18 solution approaches, tools and practices mapped onto the software security patch management process. We provide a mapping between the solutions and challenges to enable a reader to obtain a holistic overview of the gap areas. The findings also reveal that only 20.8% of the reported solutions have been rigorously evaluated in industrial settings. Conclusion: Our results reveal that 50% of the common challenges have not been directly addressed in the solutions and that most of them (38.9%) address the challenges in one phase of the process, namely vulnerability scanning, assessment and prioritisation. Based on the results that highlight the important concerns in software security patch management and the lack of solutions, we recommend a list of future research directions. This study also provides useful insights about different opportunities for practitioners to adopt new solutions and understand the variations of their practical utility.

SEAug 29, 2020
Security Awareness of End-Users of Mobile Health Applications: An Empirical Study

Bakheet Aljedaani, Aakash Ahmad, Mansooreh Zahedi et al.

Mobile systems offer portable and interactive computing, empowering users, to exploit a multitude of context-sensitive services, including mobile healthcare. Mobile health applications (i.e., mHealth apps) are revolutionizing the healthcare sector by enabling stakeholders to produce and consume healthcare services. A widespread adoption of mHealth technologies and rapid increase in mHealth apps entail a critical challenge, i.e., lack of security awareness by end-users regarding health-critical data. This paper presents an empirical study aimed at exploring the security awareness of end-users of mHealth apps. We collaborated with two mHealth providers in Saudi Arabia to gather data from 101 end-users. The results reveal that despite having the required knowledge, end-users lack appropriate behaviour , i.e., reluctance or lack of understanding to adopt security practices, compromising health-critical data with social, legal, and financial consequences. The results emphasize that mHealth providers should ensure security training of end-users (e.g., threat analysis workshops), promote best practices to enforce security (e.g., multi-step authentication), and adopt suitable mHealth apps (e.g., trade-offs for security vs usability). The study provides empirical evidence and a set of guidelines about security awareness of mHealth apps.

SEAug 11, 2020
Challenges in Docker Development: A Large-scale Study Using Stack Overflow

Mubin Ul Haque, Leonardo Horn Iwaya, M. Ali Babar

Docker technology has been increasingly used among software developers in a multitude of projects. This growing interest is due to the fact that Docker technology supports a convenient process for creating and building containers, promoting close cooperation between developer and operations teams, and enabling continuous software delivery. As a fast-growing technology, it is important to identify the Docker-related topics that are most popular as well as existing challenges and difficulties that developers face. This paper presents a large-scale empirical study identifying practitioners' perspectives on Docker technology by mining posts from the Stack Overflow (SoF) community. Method: A dataset of 113,922 Docker-related posts was created based on a set of relevant tags and contents. The dataset was cleaned and prepared. Topic modelling was conducted using Latent Dirichlet Allocation (LDA), allowing the identification of dominant topics in the domain. Our results show that most developers use SoF to ask about a broad spectrum of Docker topics including framework development, application deployment, continuous integration, web-server configuration and many more. We determined that 30 topics that developers discuss can be grouped into 13 main categories. Most of the posts belong to categories of application development, configuration, and networking. On the other hand, we find that the posts on monitoring status, transferring data, and authenticating users are more popular among developers compared to the other topics. Specifically, developers face challenges in web browser issues, networking error and memory management. Besides, there is a lack of experts in this domain. Our research findings will guide future work on the development of new tools and techniques, helping the community to focus efforts and understand existing trade-offs on Docker topics.

SEAug 10, 2020
A Large-scale Study of Security Vulnerability Support on Developer Q&A Websites

Triet H. M. Le, Roland Croft, David Hin et al.

Context: Security Vulnerabilities (SVs) pose many serious threats to software systems. Developers usually seek solutions to addressing these SVs on developer Question and Answer (Q&A) websites. However, there is still little known about on-going SV-specific discussions on different developer Q&A sites. Objective: We present a large-scale empirical study to understand developers' SV discussions and how these discussions are being supported by Q&A sites. Method: We first curate 71,329 SV posts from two large Q&A sites, namely Stack Overflow (SO) and Security StackExchange (SSE). We then use topic modeling to uncover the topics of SV-related discussions and analyze the popularity, difficulty, and level of expertise for each topic. We also perform a qualitative analysis to identify the types of solutions to SV-related questions. Results: We identify 13 main SV discussion topics on Q&A sites. Many topics do not follow the distributions and trends in expert-based security sources such as Common Weakness Enumeration (CWE) and Open Web Application Security Project (OWASP). We also discover that SV discussions attract more experts to answer than many other domains, but some difficult SV topics (e.g., Vulnerability Scanning Tools) still receive quite limited support from experts. Moreover, we identify seven key types of answers given to SV questions on Q&A sites, in which SO often provides code and instructions, while SSE usually gives experience-based advice and explanations. Conclusion: Our findings provide support for researchers and practitioners to effectively acquire, share and leverage SV knowledge on Q&A sites.

SEAug 7, 2020
An Empirical Study on Developing Secure Mobile Health Apps: The Developers Perspective

Bakheet Aljedaani, Aakash Ahmad, Mansooreh Zahedi et al.

Mobile apps exploit embedded sensors and wireless connectivity of a device to empower users with portable computations, context-aware communication, and enhanced interaction. Specifically, mobile health apps (mHealth apps for short) are becoming integral part of mobile and pervasive computing to improve the availability and quality of healthcare services. Despite the offered benefits, mHealth apps face a critical challenge, i.e., security of health critical data that is produced and consumed by the app. Several studies have revealed that security specific issues of mHealth apps have not been adequately addressed. The objectives of this study are to empirically (a) investigate the challenges that hinder development of secure mHealth apps, (b) identify practices to develop secure apps, and (c) explore motivating factors that influence secure development. We conducted this study by collecting responses of 97 developers from 25 countries, across 06 continents, working in diverse teams and roles to develop mHealth apps for Android, iOS, and Windows platform. Qualitative analysis of the survey data is based on (i) 8 critical challenges, (ii) taxonomy of best practices to ensure security, and (iii) 6 motivating factors that impact secure mHealth apps. This research provides empirical evidence as practitioners view and guidelines to develop emerging and next generation of secure mHealth apps.

SEJul 21, 2020
Challenges in Developing Secure Mobile Health Applications, A Systematic Review

Bakheet Aljedaani, M. Ali Babar

Mobile health (mHealth) applications (apps) have gained significant popularity over the last few years due to its tremendous benefits, such as lowering healthcare cost and increasing patient awareness. However, the sensitivity of healthcare data makes the security of mHealth apps a serious concern. In this review, we aim to identify and analyse the reported challenges that the developers of mHealth apps face concerning security. Additionally, our study aimed to develop a conceptual framework with the challenges faced by mHealth apps development organization for developing secure apps. The knowledge of such challenges can help to reduce the risk of developing insecure mHealth apps. We followed the Systematic Literature Review method for this review. We selected studies that have been published between January 2008 and October 2020. We selected 32 primary studies using predefined criteria and used thematic analysis method for analysing the extracted data. We identified nine challenges that can affect the development of secure mHealth apps. Such as 1) lack of security guidelines and regulations for developing secure mHealth apps, 2) developers lack of knowledge and expertise for secure mHealth app development, 3) lack of stakeholders involvement during mHealth app development, etc . Based on our analysis, we have presented a conceptual framework which highlights the correlation between the identified challenges. We conclude that our findings can help them identify their weaknesses and improve their security practices. Similarly, mHealth apps developers can identify the challenges they face to develop mHealth apps that do not pose security risks for users. Our review is a step towards providing insights into the development of secure mHealth apps. Our proposed conceptual framework can act as a practice guideline for practitioners to enhance secure mHealth apps development.

CRJun 22, 2020
Security and Privacy for mHealth and uHealth Systems: a Systematic Mapping Study

Leonardo Horn Iwaya, Aakash Ahmad, M. Ali Babar

An increased adoption of mobile health (mHealth) and ubiquitous health (uHealth) systems empower users with handheld devices and embedded sensors for a broad range of healthcare services. However, m/uHealth systems face significant challenges related to data security and privacy that must be addressed to increase the pervasiveness of such systems. This study aims to systematically identify, classify, compare, and evaluate state-of-the-art on security and privacy of m/uHealth systems. We conducted a systematic mapping study (SMS) based on 365 qualitatively selected studies to (i) classify the types, frequency, and demography of published research and (ii) synthesize and categorize research themes, (iii) recurring challenges, (iv) prominent solutions (i.e., research outcomes) and their (v) reported evaluations (i.e., practical validations). Results suggest that the existing research on security and privacy of m/uHealth systems primarily focuses on select group of control families (compliant with NIST800-53), protection of systems and information, access control, authentication, individual participation, and privacy authorisation. In contrast, areas of data governance, security and privacy policies, and program management are under-represented, although these are critical to most of the organizations that employ m/uHealth systems. Most research proposes new solutions with limited validation, reflecting a lack of evaluation of security and privacy of m/uHealth in the real world. Empirical research, development, and validation of m/uHealth security and privacy is still incipient, which may discourage practitioners from readily adopting solutions from the literature. This SMS facilitates knowledge transfer, enabling researchers and practitioners to engineer security and privacy for emerging and next generation of m/uHealth systems.

CRMay 18, 2020
Reliability and Robustness analysis of Machine Learning based Phishing URL Detectors

Bushra Sabir, M. Ali Babar, Raj Gaire et al.

ML-based Phishing URL (MLPU) detectors serve as the first level of defence to protect users and organisations from being victims of phishing attacks. Lately, few studies have launched successful adversarial attacks against specific MLPU detectors raising questions about their practical reliability and usage. Nevertheless, the robustness of these systems has not been extensively investigated. Therefore, the security vulnerabilities of these systems, in general, remain primarily unknown which calls for testing the robustness of these systems. In this article, we have proposed a methodology to investigate the reliability and robustness of 50 representative state-of-the-art MLPU models. Firstly, we have proposed a cost-effective Adversarial URL generator URLBUG that created an Adversarial URL dataset. Subsequently, we reproduced 50 MLPU (traditional ML and Deep learning) systems and recorded their baseline performance. Lastly, we tested the considered MLPU systems on Adversarial Dataset and analyzed their robustness and reliability using box plots and heat maps. Our results showed that the generated adversarial URLs have valid syntax and can be registered at a median annual price of \$11.99. Out of 13\% of the already registered adversarial URLs, 63.94\% were used for malicious purposes. Moreover, the considered MLPU models Matthew Correlation Coefficient (MCC) dropped from a median 0.92 to 0.02 when tested against $Adv_\mathrm{data}$, indicating that the baseline MLPU models are unreliable in their current form. Further, our findings identified several security vulnerabilities of these systems and provided future directions for researchers to design dependable and secure MLPU systems.

SEMay 16, 2020
Architectural Design Space for Modelling and Simulation as a Service: A Review

Mojtaba Shahin, M. Ali Babar, Muhammad Aufeef Chauhan

Modelling and Simulation as a Service (MSaaS) is a promising approach to deploy and execute Modelling and Simulation (M&S) applications quickly and on-demand. An appropriate software architecture is essential to deliver quality M&S applications following the MSaaS concept to a wide range of users. This study aims to characterize the state-of-the-art MSaaS architectures by conducting a systematic review of 31 papers published from 2010 to 2018. Our findings reveal that MSaaS applications are mainly designed using layered architecture style, followed by service-oriented architecture, component-based architecture, and pluggable component-based architecture. We also found that interoperability and deployability have the greatest importance in the architecture of MSaaS applications. In addition, our study indicates that the current MSaaS architectures do not meet the critical user requirements of modern M&S applications appropriately. Based on our results, we recommend that there is a need for more effort and research to (1) design the user interfaces that enable users to build and configure simulation models with minimum effort and limited domain knowledge, (2) provide mechanisms to improve the deployability of M&S applications, and (3) gain a deep insight into how M&S applications should be architected to respond to the emerging user requirements in the military domain.

SEMar 13, 2020
On the Role of Software Architecture in DevOps Transformation: An Industrial Case Study

Mojtaba Shahin, M. Ali Babar

Development and Operations (DevOps), a particular type of Continuous Software Engineering, has become a popular Software System Engineering paradigm. Software architecture is critical in succeeding with DevOps. However, there is little evidence-based knowledge of how software systems are architected in the industry to enable and support DevOps. Since architectural decisions, along with their rationales and implications, are very important in the architecting process, we performed an industrial case study that has empirically identified and synthesized the key architectural decisions considered essential to DevOps transformation by two software development teams. Our study also reveals that apart from the chosen architecture style, DevOps works best with modular architectures. In addition, we found that the performance of the studied teams can improve in DevOps if operations specialists are added to the teams to perform the operations tasks that require advanced expertise. Finally, investment in testing is inevitable for the teams if they want to release software changes faster.

SEMar 8, 2020
PUMiner: Mining Security Posts from Developer Question and Answer Websites with PU Learning

Triet H. M. Le, David Hin, Roland Croft et al.

Security is an increasing concern in software development. Developer Question and Answer (Q&A) websites provide a large amount of security discussion. Existing studies have used human-defined rules to mine security discussions, but these works still miss many posts, which may lead to an incomplete analysis of the security practices reported on Q&A websites. Traditional supervised Machine Learning methods can automate the mining process; however, the required negative (non-security) class is too expensive to obtain. We propose a novel learning framework, PUMiner, to automatically mine security posts from Q&A websites. PUMiner builds a context-aware embedding model to extract features of the posts, and then develops a two-stage PU model to identify security content using the labelled Positive and Unlabelled posts. We evaluate PUMiner on more than 17.2 million posts on Stack Overflow and 52,611 posts on Security StackExchange. We show that PUMiner is effective with the validation performance of at least 0.85 across all model configurations. Moreover, Matthews Correlation Coefficient (MCC) of PUMiner is 0.906, 0.534 and 0.084 points higher than one-class SVM, positive-similarity filtering, and one-stage PU models on unseen testing posts, respectively. PUMiner also performs well with an MCC of 0.745 for scenarios where string matching totally fails. Even when the ratio of the labelled positive posts to the unlabelled ones is only 1:100, PUMiner still achieves a strong MCC of 0.65, which is 160% better than fully-supervised learning. Using PUMiner, we provide the largest and up-to-date security content on Q&A websites for practitioners and researchers.

CRFeb 21, 2020
A Multi-Vocal Review of Security Orchestration

Chadni Islam, M. Ali Babar, Surya Nepal

Organizations use diverse types of security solutions to prevent cyberattacks. Multiple vendors provide security solutions developed using heterogeneous technologies and paradigms. Hence, it is a challenging rather impossible to easily make security solutions to work an integrated fashion. Security orchestration aims at smoothly integrating multivendor security tools that can effectively and efficiently interoperate to support security staff of a Security Operation Centre (SOC). Given the increasing role and importance of security orchestration, there has been an increasing amount of literature on different aspects of security orchestration solutions. However, there has been no effort to systematically review and analyze the reported solutions. We report a Multivocal Literature Review that has systematically selected and reviewed both academic and grey (blogs, web pages, white papers) literature on different aspects of security orchestration published from January 2007 until July 2017. The review has enabled us to provide a working definition of security orchestration and classify the main functionalities of security orchestration into three main areas: unification, orchestration, and automation. We have also identified the core components of a security orchestration platform and categorized the drivers of security orchestration based on technical and socio-technical aspects. We also provide a taxonomy of security orchestration based on the execution environment, automation strategy, deployment type, mode of task and resource type. This review has helped us to reveal several areas of further research and development in security orchestration.