DBJul 5, 2023Code
The FormAI Dataset: Generative AI in Software Security Through the Lens of Formal VerificationNorbert Tihanyi, Tamas Bisztray, Ridhi Jain et al.
This paper presents the FormAI dataset, a large collection of 112, 000 AI-generated compilable and independent C programs with vulnerability classification. We introduce a dynamic zero-shot prompting technique constructed to spawn diverse programs utilizing Large Language Models (LLMs). The dataset is generated by GPT-3.5-turbo and comprises programs with varying levels of complexity. Some programs handle complicated tasks like network management, table games, or encryption, while others deal with simpler tasks like string manipulation. Every program is labeled with the vulnerabilities found within the source code, indicating the type, line number, and vulnerable function name. This is accomplished by employing a formal verification method using the Efficient SMT-based Bounded Model Checker (ESBMC), which uses model checking, abstract interpretation, constraint programming, and satisfiability modulo theories to reason over safety/security properties in programs. This approach definitively detects vulnerabilities and offers a formal model known as a counterexample, thus eliminating the possibility of generating false positive reports. We have associated the identified vulnerabilities with Common Weakness Enumeration (CWE) numbers. We make the source code available for the 112, 000 programs, accompanied by a separate file containing the vulnerabilities detected in each program, making the dataset ideal for training LLMs and machine learning algorithms. Our study unveiled that according to ESBMC, 51.24% of the programs generated by GPT-3.5 contained vulnerabilities, thereby presenting considerable risks to software safety and security.
AIAug 9, 2024
On the use of neurosymbolic AI for defending against cyber attacksGudmund Grov, Jonas Halvorsen, Magnus Wiik Eckhoff et al.
It is generally accepted that all cyber attacks cannot be prevented, creating a need for the ability to detect and respond to cyber attacks. Both connectionist and symbolic AI are currently being used to support such detection and response. In this paper, we make the case for combining them using neurosymbolic AI. We identify a set of challenges when using AI today and propose a set of neurosymbolic use cases we believe are both interesting research directions for the neurosymbolic AI community and can have an impact on the cyber security field. We demonstrate feasibility through two proof-of-concept experiments.
LGJun 18, 2025Code
I Know Which LLM Wrote Your Code Last Summer: LLM generated Code Stylometry for Authorship AttributionTamas Bisztray, Bilel Cherif, Richard A. Dubniczky et al.
Detecting AI-generated code, deepfakes, and other synthetic content is an emerging research challenge. As code generated by Large Language Models (LLMs) becomes more common, identifying the specific model behind each sample is increasingly important. This paper presents the first systematic study of LLM authorship attribution for C programs. We released CodeT5-Authorship, a novel model that uses only the encoder layers from the original CodeT5 encoder-decoder architecture, discarding the decoder to focus on classification. Our model's encoder output (first token) is passed through a two-layer classification head with GELU activation and dropout, producing a probability distribution over possible authors. To evaluate our approach, we introduce LLM-AuthorBench, a benchmark of 32,000 compilable C programs generated by eight state-of-the-art LLMs across diverse tasks. We compare our model to seven traditional ML classifiers and eight fine-tuned transformer models, including BERT, RoBERTa, CodeBERT, ModernBERT, DistilBERT, DeBERTa-V3, Longformer, and LoRA-fine-tuned Qwen2-1.5B. In binary classification, our model achieves 97.56% accuracy in distinguishing C programs generated by closely related models such as GPT-4.1 and GPT-4o, and 95.40% accuracy for multi-class attribution among five leading LLMs (Gemini 2.5 Flash, Claude 3.5 Haiku, GPT-4.1, Llama 3.3, and DeepSeek-V3). To support open science, we release the CodeT5-Authorship architecture, the LLM-AuthorBench benchmark, and all relevant Google Colab scripts on GitHub: https://github.com/LLMauthorbench/.
CRApr 28
Towards Agentic Investigation of Security AlertsEven Eilertsen, Vasileios Mavroeidis, Gudmund Grov
Security analysts are overwhelmed by the volume of alerts and the low context provided by many detection systems. Early-stage investigations typically require manual correlation across multiple log sources, a task that is usually time-consuming. In this paper, we present an experimental, agentic workflow that leverages large language models (LLMs) augmented with predefined queries and constrained tool access (structured SQL over Suricata logs and grep-based text search) to automate the first stages of alert investigation. The proposed workflow integrates queries to provide an overview of the available data, and LLM components that selects which queries to use based on the overview results, extracts raw evidence from the query results, and delivers a final verdict of the alert. Our results demonstrate that the LLM-powered workflow can investigate log sources, plan an investigation, and produce a final verdict that has a significantly higher accuracy than a verdict produced by the same LLM without the proposed workflow. By recognizing the inherent limitations of directly applying LLMs to high-volume and unstructured data, we propose combining existing investigation practices of real-world analysts with a structured approach to leverage LLMs as virtual security analysts, thereby assisting and reducing the manual workload.
AIOct 20, 2024
Dynamic Intelligence Assessment: Benchmarking LLMs on the Road to AGI with a Focus on Model ConfidenceNorbert Tihanyi, Tamas Bisztray, Richard A. Dubniczky et al.
As machine intelligence evolves, the need to test and compare the problem-solving abilities of different AI models grows. However, current benchmarks are often simplistic, allowing models to perform uniformly well and making it difficult to distinguish their capabilities. Additionally, benchmarks typically rely on static question-answer pairs that the models might memorize or guess. To address these limitations, we introduce Dynamic Intelligence Assessment (DIA), a novel methodology for testing AI models using dynamic question templates and improved metrics across multiple disciplines such as mathematics, cryptography, cybersecurity, and computer science. The accompanying dataset, DIA-Bench, contains a diverse collection of challenge templates with mutable parameters presented in various formats, including text, PDFs, compiled binaries, visual puzzles, and CTF-style cybersecurity challenges. Our framework introduces four new metrics to assess a model's reliability and confidence across multiple attempts. These metrics revealed that even simple questions are frequently answered incorrectly when posed in varying forms, highlighting significant gaps in models' reliability. Notably, API models like GPT-4o often overestimated their mathematical capabilities, while ChatGPT-4o demonstrated better performance due to effective tool usage. In self-assessment, OpenAI's o1-mini proved to have the best judgement on what tasks it should attempt to solve. We evaluated 25 state-of-the-art LLMs using DIA-Bench, showing that current models struggle with complex tasks and often display unexpectedly low confidence, even with simpler questions. The DIA framework sets a new standard for assessing not only problem-solving but also a model's adaptive intelligence and ability to assess its limitations. The dataset is publicly available on the project's page: https://github.com/DIA-Bench.
CRSep 16, 2025
A Graph-Based Approach to Alert Contextualisation in Security Operations CentresMagnus Wiik Eckhoff, Peter Marius Flydal, Siem Peters et al.
Interpreting the massive volume of security alerts is a significant challenge in Security Operations Centres (SOCs). Effective contextualisation is important, enabling quick distinction between genuine threats and benign activity to prioritise what needs further analysis. This paper proposes a graph-based approach to enhance alert contextualisation in a SOC by aggregating alerts into graph-based alert groups, where nodes represent alerts and edges denote relationships within defined time-windows. By grouping related alerts, we enable analysis at a higher abstraction level, capturing attack steps more effectively than individual alerts. Furthermore, to show that our format is well suited for downstream machine learning methods, we employ Graph Matching Networks (GMNs) to correlate incoming alert groups with historical incidents, providing analysts with additional insights.
CRJun 17, 2025
LLM-Powered Intent-Based Categorization of Phishing EmailsEven Eilertsen, Vasileios Mavroeidis, Gudmund Grov
Phishing attacks remain a significant threat to modern cybersecurity, as they successfully deceive both humans and the defense mechanisms intended to protect them. Traditional detection systems primarily focus on email metadata that users cannot see in their inboxes. Additionally, these systems struggle with phishing emails, which experienced users can often identify empirically by the text alone. This paper investigates the practical potential of Large Language Models (LLMs) to detect these emails by focusing on their intent. In addition to the binary classification of phishing emails, the paper introduces an intent-type taxonomy, which is operationalized by the LLMs to classify emails into distinct categories and, therefore, generate actionable threat information. To facilitate our work, we have curated publicly available datasets into a custom dataset containing a mix of legitimate and phishing emails. Our results demonstrate that existing LLMs are capable of detecting and categorizing phishing emails, underscoring their potential in this domain.
CRJan 22, 2022
Cybersecurity Playbook Sharing with STIX 2.1Vasileios Mavroeidis, Mateusz Zych
Understanding that interoperable security playbooks will become a fundamental component of defenders' arsenal to decrease attack detection and response times, it is time to consider their position in structured sharing efforts. This report documents the process of extending Structured Threat Information eXpression (STIX) version 2.1, using the available extension definition mechanism, to enable sharing security playbooks, including Collaborative Automated Course of Action Operations (CACAO) playbooks.
CROct 20, 2021
On the Integration of Course of Action Playbooks into Shareable Cyber Threat IntelligenceVasileios Mavroeidis, Pavel Eis, Martin Zadnik et al.
Motivated by the introduction of CACAO, the first open standard that harmonizes the way we document courses of action in a machine-readable format for interoperability, and the benefits for cybersecurity operations derived from utilizing, and coupling and sharing course of action playbooks with cyber threat intelligence, we introduce a uniform metadata template that supports managing and integrating course of action playbooks into knowledge representation and knowledge management systems. We demonstrate the applicability of our approach through two use-case implementations. We utilize the playbook metadata template to introduce functionality and integrate course of action playbooks, such as CACAO, into the MISP threat intelligence platform and the OASIS Threat Actor Context ontology.
CRMar 28, 2021
Data-Driven Threat Hunting Using SysmonVasileios Mavroeidis, Audun Jøsang
Threat actors can be persistent, motivated and agile, and leverage a diversified and extensive set of tactics and techniques to attain their goals. In response to that, defenders establish threat intelligence programs to stay threat-informed and lower risk. Actionable threat intelligence is integrated into security information and event management systems (SIEM) or is accessed via more dedicated tools like threat intelligence platforms. A threat intelligence platform gives access to contextual threat information by aggregating, processing, correlating, and analyzing real-time data and information from multiple sources, and in many cases, it provides centralized analysis and reporting of an organization's security events. Sysmon logs is a data source that has received considerable attention for endpoint visibility. Approaches for threat detection using Sysmon have been proposed, mainly focusing on search engine technologies like NoSQL database systems. This paper demonstrates one of the many use cases of Sysmon and cyber threat intelligence. In particular, we present a threat assessment system that relies on a cyber threat intelligence ontology to automatically classify executed software into different threat levels by analyzing Sysmon log streams. The presented system and approach augments cyber defensive capabilities through situational awareness, prediction, and automated courses of action.
CRMar 5, 2021
Cyber Threat Intelligence Model: An Evaluation of Taxonomies, Sharing Standards, and Ontologies within Cyber Threat IntelligenceVasileios Mavroeidis, Siri Bromander
Cyber threat intelligence is the provision of evidence-based knowledge about existing or emerging threats. Benefits from threat intelligence include increased situational awareness, efficiency in security operations, and improved prevention, detection, and response capabilities. To process, correlate, and analyze vast amounts of threat information and data and derive intelligence that can be shared and consumed in meaningful times, it is required to utilize structured, machine-readable formats that incorporate the industry-required expressivity while at the same time being unambiguous. To a large extent, this is achieved with technologies like ontologies, schemas, and taxonomies. This research evaluates the coverage and high-level conceptual expressivity of cyber-threat-intelligence-relevant ontologies, sharing standards, and taxonomies pertaining to the who, what, why, where, when, and how elements of threats and attacks in addition to courses of action and technical indicators. The results confirm that little emphasis has been given to developing a comprehensive cyber threat intelligence ontology, with existing efforts being not thoroughly designed, non-interoperable, ambiguous, and lacking proper semantics and axioms for reasoning.
CRMar 3, 2021
Threat Actor Type Inference and Characterization within Cyber Threat IntelligenceVasileios Mavroeidis, Ryan Hohimer, Tim Casey et al.
As the cyber threat landscape is constantly becoming increasingly complex and polymorphic, the more critical it becomes to understand the enemy and its modus operandi for anticipatory threat reduction. Even though the cyber security community has developed a certain maturity in describing and sharing technical indicators for informing defense components, we still struggle with non-uniform, unstructured, and ambiguous higher-level information, such as the threat actor context, thereby limiting our ability to correlate with different sources to derive more contextual, accurate, and relevant intelligence. We see the need to overcome this limitation in order to increase our ability to produce and better operationalize cyber threat intelligence. Our research demonstrates how commonly agreed upon controlled vocabularies for characterizing threat actors and their operations can be used to enrich cyber threat intelligence and infer new information at a higher contextual level that is explicable and queryable. In particular, we present an ontological approach to automatically inferring the types of threat actors based on their personas, understanding their nature, and capturing polymorphism and changes in their behavior and characteristics over time. Such an approach not only enables interoperability by providing a structured way and means for sharing highly contextual cyber threat intelligence but also derives new information at machine speed and minimizes cognitive biases that manual classification approaches entail.
CVDec 17, 2020
Firearm Detection via Convolutional Neural Networks: Comparing a Semantic Segmentation Model Against End-to-End SolutionsAlexander Egiazarov, Fabio Massimo Zennaro, Vasileios Mavroeidis
Threat detection of weapons and aggressive behavior from live video can be used for rapid detection and prevention of potentially deadly incidents such as terrorism, general criminal offences, or even domestic violence. One way for achieving this is through the use of artificial intelligence and, in particular, machine learning for image analysis. In this paper we conduct a comparison between a traditional monolithic end-to-end deep learning model and a previously proposed model based on an ensemble of simpler neural networks detecting fire-weapons via semantic segmentation. We evaluated both models from different points of view, including accuracy, computational and data complexity, flexibility and reliability. Our results show that a semantic segmentation model provides considerable amount of flexibility and resilience in the low data environment compared to classical deep model models, although its configuration and tuning presents a challenge in achieving the same levels of accuracy as an end-to-end model.
CVFeb 11, 2020
Firearm Detection and Segmentation Using an Ensemble of Semantic Neural NetworksAlexander Egiazarov, Vasileios Mavroeidis, Fabio Massimo Zennaro et al.
In recent years we have seen an upsurge in terror attacks around the world. Such attacks usually happen in public places with large crowds to cause the most damage possible and get the most attention. Even though surveillance cameras are assumed to be a powerful tool, their effect in preventing crime is far from clear due to either limitation in the ability of humans to vigilantly monitor video surveillance or for the simple reason that they are operating passively. In this paper, we present a weapon detection system based on an ensemble of semantic Convolutional Neural Networks that decomposes the problem of detecting and locating a weapon into a set of smaller problems concerned with the individual component parts of a weapon. This approach has computational and practical advantages: a set of simpler neural networks dedicated to specific tasks requires less computational resources and can be trained in parallel; the overall output of the system given by the aggregation of the outputs of individual networks can be tuned by a user to trade-off false positives and false negatives; finally, according to ensemble theory, the output of the overall system will be robust and reliable even in the presence of weak individual models. We evaluated our system running simulations aimed at assessing the accuracy of individual networks and the whole system. The results on synthetic data and real-world data are promising, and they suggest that our approach may have advantages compared to the monolithic approach based on a single deep convolutional neural network.
CRNov 20, 2018
Privacy Issues and Data Protection in Big Data: A Case Study Analysis under GDPRNils Gruschka, Vasileios Mavroeidis, Kamer Vishi et al.
Big data has become a great asset for many organizations, promising improved operations and new business opportunities. However, big data has increased access to sensitive information that when processed can directly jeopardize the privacy of individuals and violate data protection laws. As a consequence, data controllers and data processors may be imposed tough penalties for non-compliance that can result even to bankruptcy. In this paper, we discuss the current state of the legal regulations and analyze different data protection and privacy-preserving techniques in the context of big data analysis. In addition, we present and analyze two real-life research projects as case studies dealing with sensitive data and actions for complying with the data regulation laws. We show which types of information might become a privacy risk, the employed privacy-preserving techniques in accordance with the legal requirements, and the influence of these techniques on the data processing phase and the research results.
CRSep 25, 2018
A Framework for Data-Driven Physical Security and Insider Threat DetectionVasileios Mavroeidis, Kamer Vishi, Audun Jøsang
This paper presents PS0, an ontological framework and a methodology for improving physical security and insider threat detection. PS0 can facilitate forensic data analysis and proactively mitigate insider threats by leveraging rule-based anomaly detection. In all too many cases, rule-based anomaly detection can detect employee deviations from organizational security policies. In addition, PS0 can be considered a security provenance solution because of its ability to fully reconstruct attack patterns. Provenance graphs can be further analyzed to identify deceptive actions and overcome analytical mistakes that can result in bad decision-making, such as false attribution. Moreover, the information can be used to enrich the available intelligence (about intrusion attempts) that can form use cases to detect and remediate limitations in the system, such as loosely-coupled provenance graphs that in many cases indicate weaknesses in the physical security architecture. Ultimately, validation of the framework through use cases demonstrates and proves that PS0 can improve an organization's security posture in terms of physical security and insider threat detection.
CRMay 27, 2018
An Evaluation of Score Level Fusion Approaches for Fingerprint and Finger-vein BiometricsKamer Vishi, Vasileios Mavroeidis
Biometric systems have to address many requirements, such as large population coverage, demographic diversity, varied deployment environment, as well as practical aspects like performance and spoofing attacks. Traditional unimodal biometric systems do not fully meet the aforementioned requirements making them vulnerable and susceptible to different types of attacks. In response to that, modern biometric systems combine multiple biometric modalities at different fusion levels. The fused score is decisive to classify an unknown user as a genuine or impostor. In this paper, we evaluate combinations of score normalization and fusion techniques using two modalities (fingerprint and finger-vein) with the goal of identifying which one achieves better improvement rate over traditional unimodal biometric systems. The individual scores obtained from finger-veins and fingerprints are combined at score level using three score normalization techniques (min-max, z-score, hyperbolic tangent) and four score fusion approaches (minimum score, maximum score, simple sum, user weighting). The experimental results proved that the combination of hyperbolic tangent score normalization technique with the simple sum fusion approach achieve the best improvement rate of 99.98%.
CRMar 31, 2018
The Impact of Quantum Computing on Present CryptographyVasileios Mavroeidis, Kamer Vishi, Mateusz D. Zych et al.
The aim of this paper is to elucidate the implications of quantum computing in present cryptography and to introduce the reader to basic post-quantum algorithms. In particular the reader can delve into the following subjects: present cryptographic schemes (symmetric and asymmetric), differences between quantum and classical computing, challenges in quantum computing, quantum algorithms (Shor's and Grover's), public key encryption schemes affected, symmetric schemes affected, the impact on hash functions, and post quantum cryptography. Specifically, the section of Post-Quantum Cryptography deals with different quantum key distribution methods and mathematicalbased solutions, such as the BB84 protocol, lattice-based cryptography, multivariate-based cryptography, hash-based signatures and code-based cryptography.
CRSep 20, 2017
Automatic Detection of Malware-Generated Domains with Recurrent Neural ModelsPierre Lison, Vasileios Mavroeidis
Modern malware families often rely on domain-generation algorithms (DGAs) to determine rendezvous points to their command-and-control server. Traditional defence strategies (such as blacklisting domains or IP addresses) are inadequate against such techniques due to the large and continuously changing list of domains produced by these algorithms. This paper demonstrates that a machine learning approach based on recurrent neural networks is able to detect domain names generated by DGAs with high precision. The neural models are estimated on a large training set of domains generated by various malwares. Experimental results show that this data-driven approach can detect malware-generated domain names with a F_1 score of 0.971. To put it differently, the model can automatically detect 93 % of malware-generated domain names for a false positive rate of 1:100.