Katja Tuma

SE
h-index4
5papers
49citations
Novelty35%
AI Score27

5 Papers

SEJul 14, 2024
Risks of ignoring uncertainty propagation in AI-augmented security pipelines

Emanuele Mezzi, Aurora Papotti, Fabio Massacci et al.

The use of AI technologies is being integrated into the secure development of software-based systems, with an increasing trend of composing AI-based subsystems (with uncertain levels of performance) into automated pipelines. This presents a fundamental research challenge and seriously threatens safety-critical domains. Despite the existing knowledge about uncertainty in risk analysis, no previous work has estimated the uncertainty of AI-augmented systems given the propagation of errors in the pipeline. We provide the formal underpinnings for capturing uncertainty propagation, develop a simulator to quantify uncertainty, and evaluate the simulation of propagating errors with one case study. We discuss the generalizability of our approach and its limitations and present recommendations for evaluation policies concerning AI systems. Future work includes extending the approach by relaxing the remaining assumptions and by experimenting with a real system.

SEAug 19, 2021Code
Checking Security Compliance between Models and Code

Katja Tuma, Sven Peldszus, Daniel Strüber et al.

It is challenging to verify that the planned security mechanisms are actually implemented in the software. In the context of model-based development, the implemented security mechanisms must capture all intended security properties that were considered in the design models. Assuring this compliance manually is labor intensive and can be error-prone. This work introduces the first semi-automatic technique for secure data flow compliance checks between design models and code. We develop heuristic-based automated mappings between a design-level model (SecDFD, provided by humans) and a code-level representation (Program Model, automatically extracted from the implementation) in order to guide users in discovering compliance violations, and hence potential security flaws in the code. These mappings enable an automated, and project-specific static analysis of the implementation with respect to the desired security properties of the design model. We developed two types of security compliance checks and evaluated the entire approach on open source Java projects.

CRMar 29, 2025
Large Language Models Are Unreliable for Cyber Threat Intelligence

Emanuele Mezzi, Fabio Massacci, Katja Tuma

Several recent works have argued that Large Language Models (LLMs) can be used to tame the data deluge in the cybersecurity field, by improving the automation of Cyber Threat Intelligence (CTI) tasks. This work presents an evaluation methodology that other than allowing to test LLMs on CTI tasks when using zero-shot learning, few-shot learning and fine-tuning, also allows to quantify their consistency and their confidence level. We run experiments with three state-of-the-art LLMs and a dataset of 350 threat intelligence reports and present new evidence of potential security risks in relying on LLMs for CTI. We show how LLMs cannot guarantee sufficient performance on real-size reports while also being inconsistent and overconfident. Few-shot learning and fine-tuning only partially improve the results, thus posing doubts about the possibility of using LLMs for CTI scenarios, where labelled datasets are lacking and where confidence is a fundamental factor.

SEOct 8, 2019
Finding Security Threats That Matter: An Industrial Case Study

Katja Tuma, Christian Sandberg, Urban Thorsson et al.

Recent trends in the software engineering (i.e., Agile, DevOps) have shortened the development life-cycle limiting resources spent on security analysis of software designs. In this context, architecture models are (often manually) analyzed for potential security threats. Risk-last threat analysis suggests identifying all security threats before prioritizing them. In contrast, risk-first threat analysis suggests identifying the risks before the threats, by-passing threat prioritization. This seems promising for organizations where developing speed is of great importance. Yet, little empirical evidence exists about the effect of sacrificing systematicity for high-priority threats on the performance and execution of threat analysis. To this aim, we conduct a case study with industrial experts from the automotive domain, where we empirically compare a risk-first technique to a risk-last technique. In this study, we consciously trade the amount of participants for a more realistic simulation of threat analysis sessions in practice. This allows us to closely observe industrial experts and gain deep insights into the industrial practice. This work contributes with: (i) a quantitative comparison of performance, (ii) a quantitative and qualitative comparison of execution, and (iii) a comparative discussion of the two techniques. We find no differences in the productivity and timeliness of discovering high-priority security threats. Yet, we find differences in analysis execution. In particular, participants using the risk-first technique found twice as many high-priority threats, developed detailed attack scenarios, and discussed threat feasibility in detail. On the other hand, participants using the risk-last technique found more medium and low-priority threats and finished early.

SEJun 5, 2019
Inspection Guidelines to Identify Security Design Flaws

Katja Tuma, Danial Hosseini, Kyriakos Malamas et al.

Recent trends in the software development practices (Agile, DevOps, CI) have shortened the development life-cycle causing the need for efficient security-by-design approaches. In this context, software architectures are analyzed for potential vulnerabilities and design flaws. Yet, design flaws are often documented with natural language and require a manual analysis, which is inefficient. Besides low-level vulnerability databases (e.g., CWE, CAPEC) there is little systematized knowledge on security design flaws. The purpose of this work is to provide a catalog of security design flaws and to empirically evaluate the inspection guidelines for detecting security design flaws. To this aim, we present a catalog of 19 security design flaws and conduct empirical studies with master and doctoral students. This paper contributes with: (i) a catalog of security design flaws, (ii) an empirical evaluation of the inspection guidelines with master students, and (iii) a replicated evaluation with doctoral students. We also account for the shortcomings of the inspection guidelines and make suggestions for their improvement with respect to the generalization of guidelines, catalog re-organization, and format of documentation. We record similar precision, recall, and productivity in both empirical studies and discuss the potential for automating the security design flaw detection.