Riccardo Scandariato

SE
h-index8
15papers
269citations
Novelty26%
AI Score37

15 Papers

SEMar 16, 2023
LLMSecEval: A Dataset of Natural Language Prompts for Security Evaluations

Catherine Tony, Markus Mutas, Nicolás E. Díaz Ferreyra et al.

Large Language Models (LLMs) like Codex are powerful tools for performing code completion and code generation tasks as they are trained on billions of lines of code from publicly available sources. Moreover, these models are capable of generating code snippets from Natural Language (NL) descriptions by learning languages and programming practices from public GitHub repositories. Although LLMs promise an effortless NL-driven deployment of software applications, the security of the code they generate has not been extensively investigated nor documented. In this work, we present LLMSecEval, a dataset containing 150 NL prompts that can be leveraged for assessing the security performance of such models. Such prompts are NL descriptions of code snippets prone to various security vulnerabilities listed in MITRE's Top 25 Common Weakness Enumeration (CWE) ranking. Each prompt in our dataset comes with a secure implementation example to facilitate comparative evaluations against code produced by LLMs. As a practical application, we show how LLMSecEval can be used for evaluating the security of snippets automatically generated from NL descriptions.

CRNov 24, 2022
GitHub Considered Harmful? Analyzing Open-Source Projects for the Automatic Generation of Cryptographic API Call Sequences

Catherine Tony, Nicolás E. Díaz Ferreyra, Riccardo Scandariato

GitHub is a popular data repository for code examples. It is being continuously used to train several AI-based tools to automatically generate code. However, the effectiveness of such tools in correctly demonstrating the usage of cryptographic APIs has not been thoroughly assessed. In this paper, we investigate the extent and severity of misuses, specifically caused by incorrect cryptographic API call sequences in GitHub. We also analyze the suitability of GitHub data to train a learning-based model to generate correct cryptographic API call sequences. For this, we manually extracted and analyzed the call sequences from GitHub. Using this data, we augmented an existing learning-based model called DeepAPI to create two security-specific models that generate cryptographic API call sequences for a given natural language (NL) description. Our results indicate that it is imperative to not neglect the misuses in API call sequences while using data sources like GitHub, to train models that generate code.

SEJul 9, 2024
Prompting Techniques for Secure Code Generation: A Systematic Investigation

Catherine Tony, Nicolás E. Díaz Ferreyra, Markus Mutas et al.

Large Language Models (LLMs) are gaining momentum in software development with prompt-driven programming enabling developers to create code from natural language (NL) instructions. However, studies have questioned their ability to produce secure code and, thereby, the quality of prompt-generated software. Alongside, various prompting techniques that carefully tailor prompts have emerged to elicit optimal responses from LLMs. Still, the interplay between such prompting strategies and secure code generation remains under-explored and calls for further investigations. OBJECTIVE: In this study, we investigate the impact of different prompting techniques on the security of code generated from NL instructions by LLMs. METHOD: First we perform a systematic literature review to identify the existing prompting techniques that can be used for code generation tasks. A subset of these techniques are evaluated on GPT-3, GPT-3.5, and GPT-4 models for secure code generation. For this, we used an existing dataset consisting of 150 NL security-relevant code-generation prompts. RESULTS: Our work (i) classifies potential prompting techniques for code generation (ii) adapts and evaluates a subset of the identified techniques for secure code generation tasks and (iii) observes a reduction in security weaknesses across the tested LLMs, especially after using an existing technique called Recursive Criticism and Improvement (RCI), contributing valuable insights to the ongoing discourse on LLM-generated code security.

SEFeb 5, 2025Code
A Match Made in Heaven? Matching Test Cases and Vulnerabilities With the VUTECO Approach

Emanuele Iannone, Quang-Cuong Bui, Riccardo Scandariato

Software vulnerabilities are commonly detected via static analysis, penetration testing, and fuzzing. They can also be found by running unit tests - so-called vulnerability-witnessing tests - that stimulate the security-sensitive behavior with crafted inputs. Developing such tests is difficult and time-consuming; thus, automated data-driven approaches could help developers intercept vulnerabilities earlier. However, training and validating such approaches require a lot of data, which is currently scarce. This paper introduces VUTECO, a deep learning-based approach for collecting instances of vulnerability-witnessing tests from Java repositories. VUTECO carries out two tasks: (1) the "Finding" task to determine whether a test case is security-related, and (2) the "Matching" task to relate a test case to the exact vulnerability it is witnessing. VUTECO successfully addresses the Finding task, achieving perfect precision and 0.83 F0.5 score on validated test cases in VUL4J and returning 102 out of 145 (70%) correct security-related test cases from 244 open-source Java projects. Despite showing sufficiently good performance for the Matching task - i.e., 0.86 precision and 0.68 F0.5 score - VUTECO failed to retrieve any valid match in the wild. Nevertheless, we observed that in almost all of the matches, the test case was still security-related despite being matched to the wrong vulnerability. In the end, VUTECO can help find vulnerability-witnessing tests, though the matching with the right vulnerability is yet to be solved; the findings obtained lay the stepping stone for future research on the matter.

SEAug 19, 2021Code
Checking Security Compliance between Models and Code

Katja Tuma, Sven Peldszus, Daniel Strüber et al.

It is challenging to verify that the planned security mechanisms are actually implemented in the software. In the context of model-based development, the implemented security mechanisms must capture all intended security properties that were considered in the design models. Assuring this compliance manually is labor intensive and can be error-prone. This work introduces the first semi-automatic technique for secure data flow compliance checks between design models and code. We develop heuristic-based automated mappings between a design-level model (SecDFD, provided by humans) and a code-level representation (Program Model, automatically extracted from the implementation) in order to guide users in discovering compliance violations, and hence potential security flaws in the code. These mappings enable an automated, and project-specific static analysis of the implementation with respect to the desired security properties of the design model. We developed two types of security compliance checks and evaluated the entire approach on open source Java projects.

51.8SEApr 9
Security Concerns in Generative AI Coding Assistants: Insights from Online Discussions on GitHub Copilot

Nicolás E. Díaz Ferreyra, Monika Swetha Gurupathi, Zadia Codabux et al.

Generative Artificial Intelligence (GenAI) has become a central component of many development tools (e.g., GitHub Copilot) that support software practitioners across multiple programming tasks, including code completion, documentation, and bug detection. However, current research has identified significant limitations and open issues in GenAI, including reliability, non-determinism, bias, and copyright infringement. While prior work has primarily focused on assessing the technical performance of these technologies for code generation, less attention has been paid to emerging concerns of software developers, particularly in the security realm. OBJECTIVE: This work explores security concerns regarding the use of GenAI-based coding assistants by analyzing challenges voiced by developers and software enthusiasts in public online forums. METHOD: We retrieved posts, comments, and discussion threads addressing security issues in GitHub Copilot from three popular platforms, namely Stack Overflow, Reddit, and Hacker News. These discussions were clustered using BERTopic and then synthesized using thematic analysis to identify distinct categories of security concerns. RESULTS: Four major concern areas were identified, including potential data leakage, code licensing, adversarial attacks (e.g., prompt injection), and insecure code suggestions, underscoring critical reflections on the limitations and trade-offs of GenAI in software engineering. IMPLICATIONS: Our findings contribute to a broader understanding of how developers perceive and engage with GenAI-based coding assistants, while highlighting key areas for improving their built-in security features.

CRFeb 3, 2022
SoK: Security of Microservice Applications: A Practitioners' Perspective on Challenges and Best Practices

Priyanka Billawa, Anusha Bambhore Tukaram, Nicolás E. Díaz Ferreyra et al.

Cloud-based application deployment is becoming increasingly popular among businesses, thanks to the emergence of microservices. However, securing such architectures is a challenging task since traditional security concepts cannot be directly applied to microservice architectures due to their distributed nature. The situation is exacerbated by the scattered nature of guidelines and best practices advocated by practitioners and organizations in this field. This research paper we aim to shay light over the current microservice security discussions hidden within Grey Literature (GL) sources. Particularly, we identify the challenges that arise when securing microservice architectures, as well as solutions recommended by practitioners to address these issues. For this, we conducted a systematic GL study on the challenges and best practices of microservice security present in the Internet with the goal of capturing relevant discussions in blogs, white papers, and standards. We collected 312 GL sources from which 57 were rigorously classified and analyzed. This analysis on the one hand validated past academic literature studies in the area of microservice security, but it also identified improvements to existing methodologies pointing towards future research directions.

SEMar 4, 2021
Secure Software Development in the Era of Fluid Multi-party Open Software and Services

Ivan Pashchenko, Riccardo Scandariato, Antonino Sabetta et al.

Pushed by market forces, software development has become fast-paced. As a consequence, modern development projects are assembled from 3rd-party components. Security & privacy assurance techniques once designed for large, controlled updates over months or years, must now cope with small, continuous changes taking place within a week, and happening in sub-components that are controlled by third-party developers one might not even know they existed. In this paper, we aim to provide an overview of the current software security approaches and evaluate their appropriateness in the face of the changed nature in software development. Software security assurance could benefit by switching from a process-based to an artefact-based approach. Further, security evaluation might need to be more incremental, automated and decentralized. We believe this can be achieved by supporting mechanisms for lightweight and scalable screenings that are applicable to the entire population of software components albeit there might be a price to pay.

CRJun 7, 2020
Contextualisation of Data Flow Diagrams for security analysis

Shamal Faily, Riccardo Scandariato, Adam Shostack et al.

Data flow diagrams (DFDs) are popular for sketching systems for subsequent threat modelling. Their limited semantics make reasoning about them difficult, but enriching them endangers their simplicity and subsequent ease of take up. We present an approach for reasoning about tainted data flows in design-level DFDs by putting them in context with other complementary usability and requirements models. We illustrate our approach using a pilot study, where tainted data flows were identified without any augmentations to either the DFD or its complementary models.

SEMar 31, 2020
Security Assurance Cases -- State of the Art of an Emerging Approach

Mazen Mohamad, Jan-Philipp Steghöfer, Riccardo Scandariato

Security Assurance Cases (SAC) are a form of structured argumentation used to reason about the security properties of a system. After the successful adoption of assurance cases for safety, SACs are getting significant traction in recent years, especially in safety-critical industries (e.g., automotive), where there is an increasing pressure to be compliant with several security standards and regulations. Accordingly, research in the field of SAC has flourished in the past decade, with different approaches being investigated. In an effort to systematize this active field of research, we conducted a systematic literature review (SLR) of the existing academic studies on SAC. Our review resulted in an in-depth analysis and comparison of 51 papers. Our results indicate that, while there are numerous papers discussing the importance of security assurance cases and their usage scenarios, the literature is still immature with respect to concrete support for practitioners on how to build and maintain a SAC. More importantly, even though some methodologies are available, their validation and tool support is still lacking.

SEMar 31, 2020
Cross-project Classification of Security-related Requirements

Mazen Mohamad, Jan-Philipp Steghöfer, Riccardo Scandariato

We investigate the feasibility of using a classifier for security-related requirements trained on requirement specifications available online. This is helpful in case different requirement types are not differentiated in a large existing requirement specification. Our work is motivated by the need to identify security requirements for the creation of security assurance cases that become a necessity for many organizations with new and upcoming standards like GDPR and HiPAA. We base our investigation on ten requirement specifications, randomly selected from a Google Search and partially pre-labeled. To validate the model, we run 10-fold cross-validation on the data where each specification constitutes a group. Our results indicate the feasibility of training a model from a heterogeneous data set including specifications from multiple domains and in different styles. However, performance benefits from revising the pre-labeled data for consistency. Additionally, we show that classifiers trained only on a specific specification type fare worse and that the way requirements are written has no impact on classifier accuracy.

SEMar 31, 2020
Security Assurance Cases for Road Vehicles: an Industry Perspective

Mazen Mohamad, Alexander Åström, Örjan Askerdal et al.

Assurance cases are structured arguments that are commonly used to reason about the safety of a product or service. Currently, there is an ongoing push towards using assurance cases for also cybersecurity, especially in safety-critical domains, like automotive. While the industry is faced with the challenge of defining a sound methodology to build security assurance cases, the state of the art is rather immature. Therefore, we have conducted a thorough investigation of the (external) constraints and (internal) needs that security assurance cases have to satisfy in the context of the automotive industry. This has been done in the context of two large automotive companies in Sweden. The end result is a set of recommendations that automotive companies can apply in order to define security assurance cases that are (i) aligned with the constraints imposed by the existing and upcoming standards and regulations and (ii)harmonized with the internal product development processes and organizational practices. We expect the results to be also of interest for product companies in other safety-critical domains, like healthcare, transportation, and so on

SEJan 8, 2020
Perception and Acceptance of an Autonomous Refactoring Bot

Marvin Wyrich, Regina Hebig, Stefan Wagner et al.

The use of autonomous bots for automatic support in software development tasks is increasing. In the past, however, they were not always perceived positively and sometimes experienced a negative bias compared to their human counterparts. We conducted a qualitative study in which we deployed an autonomous refactoring bot for 41 days in a student software development project. In between and at the end, we conducted semi-structured interviews to find out how developers perceive the bot and whether they are more or less critical when reviewing the contributions of a bot compared to human contributions. Our findings show that the bot was perceived as a useful and unobtrusive contributor, and developers were no more critical of it than they were about their human colleagues, but only a few team members felt responsible for the bot.

SEOct 8, 2019
Finding Security Threats That Matter: An Industrial Case Study

Katja Tuma, Christian Sandberg, Urban Thorsson et al.

Recent trends in the software engineering (i.e., Agile, DevOps) have shortened the development life-cycle limiting resources spent on security analysis of software designs. In this context, architecture models are (often manually) analyzed for potential security threats. Risk-last threat analysis suggests identifying all security threats before prioritizing them. In contrast, risk-first threat analysis suggests identifying the risks before the threats, by-passing threat prioritization. This seems promising for organizations where developing speed is of great importance. Yet, little empirical evidence exists about the effect of sacrificing systematicity for high-priority threats on the performance and execution of threat analysis. To this aim, we conduct a case study with industrial experts from the automotive domain, where we empirically compare a risk-first technique to a risk-last technique. In this study, we consciously trade the amount of participants for a more realistic simulation of threat analysis sessions in practice. This allows us to closely observe industrial experts and gain deep insights into the industrial practice. This work contributes with: (i) a quantitative comparison of performance, (ii) a quantitative and qualitative comparison of execution, and (iii) a comparative discussion of the two techniques. We find no differences in the productivity and timeliness of discovering high-priority security threats. Yet, we find differences in analysis execution. In particular, participants using the risk-first technique found twice as many high-priority threats, developed detailed attack scenarios, and discussed threat feasibility in detail. On the other hand, participants using the risk-last technique found more medium and low-priority threats and finished early.

SEJun 5, 2019
Inspection Guidelines to Identify Security Design Flaws

Katja Tuma, Danial Hosseini, Kyriakos Malamas et al.

Recent trends in the software development practices (Agile, DevOps, CI) have shortened the development life-cycle causing the need for efficient security-by-design approaches. In this context, software architectures are analyzed for potential vulnerabilities and design flaws. Yet, design flaws are often documented with natural language and require a manual analysis, which is inefficient. Besides low-level vulnerability databases (e.g., CWE, CAPEC) there is little systematized knowledge on security design flaws. The purpose of this work is to provide a catalog of security design flaws and to empirically evaluate the inspection guidelines for detecting security design flaws. To this aim, we present a catalog of 19 security design flaws and conduct empirical studies with master and doctoral students. This paper contributes with: (i) a catalog of security design flaws, (ii) an empirical evaluation of the inspection guidelines with master students, and (iii) a replicated evaluation with doctoral students. We also account for the shortcomings of the inspection guidelines and make suggestions for their improvement with respect to the generalization of guidelines, catalog re-organization, and format of documentation. We record similar precision, recall, and productivity in both empirical studies and discuss the potential for automating the security design flaw detection.