Mohamed Amine Ferrag

h-index46

39papers

2,661citations

Novelty27%

AI Score53

Ranked #32,690 of 205,806 authors (top 16%)#554 in CR (top 8%)

39 Papers

DBJul 5, 2023Code

The FormAI Dataset: Generative AI in Software Security Through the Lens of Formal Verification

Norbert Tihanyi, Tamas Bisztray, Ridhi Jain et al.

This paper presents the FormAI dataset, a large collection of 112, 000 AI-generated compilable and independent C programs with vulnerability classification. We introduce a dynamic zero-shot prompting technique constructed to spawn diverse programs utilizing Large Language Models (LLMs). The dataset is generated by GPT-3.5-turbo and comprises programs with varying levels of complexity. Some programs handle complicated tasks like network management, table games, or encryption, while others deal with simpler tasks like string manipulation. Every program is labeled with the vulnerabilities found within the source code, indicating the type, line number, and vulnerable function name. This is accomplished by employing a formal verification method using the Efficient SMT-based Bounded Model Checker (ESBMC), which uses model checking, abstract interpretation, constraint programming, and satisfiability modulo theories to reason over safety/security properties in programs. This approach definitively detects vulnerabilities and offers a formal model known as a counterexample, thus eliminating the possibility of generating false positive reports. We have associated the identified vulnerabilities with Common Weakness Enumeration (CWE) numbers. We make the source code available for the 112, 000 programs, accompanied by a separate file containing the vulnerabilities detected in each program, making the dataset ideal for training LLMs and machine learning algorithms. Our study unveiled that according to ESBMC, 51.24% of the programs generated by GPT-3.5 contained vulnerabilities, thereby presenting considerable risks to software safety and security.

NIFeb 9Code

6G-Bench: An Open Benchmark for Semantic Communication and Network-Level Reasoning with Foundation Models in AI-Native 6G Networks

Mohamed Amine Ferrag, Abderrahmane Lakas, Merouane Debbah

This paper introduces 6G-Bench, an open benchmark for evaluating semantic communication and network-level reasoning in AI-native 6G networks. 6G-Bench defines a taxonomy of 30 decision-making tasks (T1--T30) extracted from ongoing 6G and AI-agent standardization activities in 3GPP, IETF, ETSI, ITU-T, and the O-RAN Alliance, and organizes them into five standardization-aligned capability categories. Starting from 113,475 scenarios, we generate a balanced pool of 10,000 very-hard multiple-choice questions using task-conditioned prompts that enforce multi-step quantitative reasoning under uncertainty and worst-case regret minimization over multi-turn horizons. After automated filtering and expert human validation, 3,722 questions are retained as a high-confidence evaluation set, while the full pool is released to support training and fine-tuning of 6G-specialized models. Using 6G-Bench, we evaluate 22 foundation models spanning dense and mixture-of-experts architectures, short- and long-context designs (up to 1M tokens), and both open-weight and proprietary systems. Across models, deterministic single-shot accuracy (pass@1) spans a wide range from 0.22 to 0.82, highlighting substantial variation in semantic reasoning capability. Leading models achieve intent and policy reasoning accuracy in the range 0.87--0.89, while selective robustness analysis on reasoning-intensive tasks shows pass@5 values ranging from 0.20 to 0.91. To support open science and reproducibility, we release the 6G-Bench dataset on GitHub: https://github.com/maferrag/6G-Bench

SYJan 1Code

$α^3$-Bench: A Unified Benchmark of Safety, Robustness, and Efficiency for LLM-Based UAV Agents over 6G Networks

Mohamed Amine Ferrag, Abderrahmane Lakas, Merouane Debbah

Large Language Models (LLMs) are increasingly used as high level controllers for autonomous Unmanned Aerial Vehicle (UAV) missions. However, existing evaluations rarely assess whether such agents remain safe, protocol compliant, and effective under realistic next generation networking constraints. This paper introduces $α^3$-Bench, a benchmark for evaluating LLM driven UAV autonomy as a multi turn conversational reasoning and control problem operating under dynamic 6G conditions. Each mission is formulated as a language mediated control loop between an LLM based UAV agent and a human operator, where decisions must satisfy strict schema validity, mission policies, speaker alternation, and safety constraints while adapting to fluctuating network slices, latency, jitter, packet loss, throughput, and edge load variations. To reflect modern agentic workflows, $α^3$-Bench integrates a dual action layer supporting both tool calls and agent to agent coordination, enabling evaluation of tool use consistency and multi agent interactions. We construct a large scale corpus of 113k conversational UAV episodes grounded in UAVBench scenarios and evaluate 17 state of the art LLMs using a fixed subset of 50 episodes per scenario under deterministic decoding. We propose a composite $α^3$ metric that unifies six pillars: Task Outcome, Safety Policy, Tool Consistency, Interaction Quality, Network Robustness, and Communication Cost, with efficiency normalized scores per second and per thousand tokens. Results show that while several models achieve high mission success and safety compliance, robustness and efficiency vary significantly under degraded 6G conditions, highlighting the need for network aware and resource efficient LLM based UAV agents. The dataset is publicly available on GitHub : https://github.com/maferrag/AlphaBench

LGMar 2Code

FreeGNN: Continual Source-Free Graph Neural Network Adaptation for Renewable Energy Forecasting

Abderaouf Bahi, Amel Ourici, Ibtissem Gasmi et al.

Accurate forecasting of renewable energy generation is essential for efficient grid management and sustainable power planning. However, traditional supervised models often require access to labeled data from the target site, which may be unavailable due to privacy, cost, or logistical constraints. In this work, we propose FreeGNN, a Continual Source-Free Graph Domain Adaptation framework that enables adaptive forecasting on unseen renewable energy sites without requiring source data or target labels. Our approach integrates a spatio-temporal Graph Neural Network (GNN) backbone with a teacher--student strategy, a memory replay mechanism to mitigate catastrophic forgetting, graph-based regularization to preserve spatial correlations, and a drift-aware weighting scheme to dynamically adjust adaptation strength during streaming updates. This combination allows the model to continuously adapt to non-stationary environmental conditions while maintaining robustness and stability. We conduct extensive experiments on three real-world datasets: GEFCom2012, Solar PV, and Wind SCADA, encompassing multiple sites, temporal resolutions, and meteorological features. The ablation study confirms that each component memory, graph regularization, drift-aware adaptation, and teacher--student strategy contributes significantly to overall performance. The experiments show that FreeGNN achieves an MAE of 5.237 and an RMSE of 7.123 on the GEFCom dataset, an MAE of 1.107 and an RMSE of 1.512 on the Solar PV dataset, and an MAE of 0.382 and an RMSE of 0.523 on the Wind SCADA dataset. These results demonstrate its ability to achieve accurate and robust forecasts in a source-free, continual learning setting, highlighting its potential for real-world deployment in adaptive renewable energy systems. For reproducibility, implementation details are available at: https://github.com/AraoufBh/FreeGNN.

NIMar 2Code

How Small Can 6G Reason? Scaling Tiny Language Models for AI-Native Networks

Mohamed Amine Ferrag, Abderrahmane Lakas, Merouane Debbah

Emerging 6G visions, reflected in ongoing standardization efforts within 3GPP, IETF, ETSI, ITU-T, and the O-RAN Alliance, increasingly characterize networks as AI-native systems in which high-level semantic reasoning layers operate above standardized control and data-plane functions. Although frontier-scale large language models (LLMs) such as Qwen2.5-7B and Olmo-3-7B demonstrate strong reasoning capability, their computational footprint limits deployment in latency-sensitive, edge-native infrastructures. This paper presents a systematic empirical study of the scaling behavior and deployment efficiency of compact language models for network-level semantic reasoning in AI-native 6G systems. Using 6G-Bench, a standardization-aligned benchmark comprising 30 decision-making tasks across five capability domains, we evaluate models ranging from 135M (SmolLM2-135M) to 7B parameters (Qwen2.5-7B), including mid-scale architectures such as Llama-3.2-1B, Granite-1B, and Qwen2.5-3B. Deterministic accuracy (pass@1) increases from 0.224 at 135M to 0.707 at 7B, but scaling gains are highly non-uniform. A pronounced stability transition occurs in the 1 to 1.5B range, where accuracy rises from 0.373 (Llama-3.2-1B) to 0.531 (Qwen2.5-1.5B) and the instability gap Delta_5 contracts from 0.356 to 0.138. Beyond 3B parameters, improvements diminish (+0.064 from 3B to 7B). Through single-query inference profiling and an Edge Score metric that normalizes accuracy by latency and memory footprint, we show that semantic reliability per unit edge resource does not scale monotonically with parameter count. Instead, mid-scale models (approximately 1.5 to 3B) achieve the most favorable balance between deterministic stability and computational efficiency, providing deployment-relevant guidance for AI-native 6G architectures. All scripts and results are publicly available at https://github.com/maferrag/6G-Bench

CRJan 26Code

$α^3$-SecBench: A Large-Scale Evaluation Suite of Security, Resilience, and Trust for LLM-based UAV Agents over 6G Networks

Mohamed Amine Ferrag, Abderrahmane Lakas, Merouane Debbah

Autonomous unmanned aerial vehicle (UAV) systems are increasingly deployed in safety-critical, networked environments where they must operate reliably in the presence of malicious adversaries. While recent benchmarks have evaluated large language model (LLM)-based UAV agents in reasoning, navigation, and efficiency, systematic assessment of security, resilience, and trust under adversarial conditions remains largely unexplored, particularly in emerging 6G-enabled settings. We introduce $α^{3}$-SecBench, the first large-scale evaluation suite for assessing the security-aware autonomy of LLM-based UAV agents under realistic adversarial interference. Building on multi-turn conversational UAV missions from $α^{3}$-Bench, the framework augments benign episodes with 20,000 validated security overlay attack scenarios targeting seven autonomy layers, including sensing, perception, planning, control, communication, edge/cloud infrastructure, and LLM reasoning. $α^{3}$-SecBench evaluates agents across three orthogonal dimensions: security (attack detection and vulnerability attribution), resilience (safe degradation behavior), and trust (policy-compliant tool usage). We evaluate 23 state-of-the-art LLMs from major industrial providers and leading AI labs using thousands of adversarially augmented UAV episodes sampled from a corpus of 113,475 missions spanning 175 threat types. While many models reliably detect anomalous behavior, effective mitigation, vulnerability attribution, and trustworthy control actions remain inconsistent. Normalized overall scores range from 12.9% to 57.1%, highlighting a significant gap between anomaly detection and security-aware autonomous decision-making. We release $α^{3}$-SecBench on GitHub: https://github.com/maferrag/AlphaSecBench

CRJun 25, 2023

Revolutionizing Cyber Threat Detection with Large Language Models: A privacy-preserving BERT-based Lightweight Model for IoT/IIoT Devices

Mohamed Amine Ferrag, Mthandazo Ndhlovu, Norbert Tihanyi et al.

The field of Natural Language Processing (NLP) is currently undergoing a revolutionary transformation driven by the power of pre-trained Large Language Models (LLMs) based on groundbreaking Transformer architectures. As the frequency and diversity of cybersecurity attacks continue to rise, the importance of incident detection has significantly increased. IoT devices are expanding rapidly, resulting in a growing need for efficient techniques to autonomously identify network-based attacks in IoT networks with both high precision and minimal computational requirements. This paper presents SecurityBERT, a novel architecture that leverages the Bidirectional Encoder Representations from Transformers (BERT) model for cyber threat detection in IoT networks. During the training of SecurityBERT, we incorporated a novel privacy-preserving encoding technique called Privacy-Preserving Fixed-Length Encoding (PPFLE). We effectively represented network traffic data in a structured format by combining PPFLE with the Byte-level Byte-Pair Encoder (BBPE) Tokenizer. Our research demonstrates that SecurityBERT outperforms traditional Machine Learning (ML) and Deep Learning (DL) methods, such as Convolutional Neural Networks (CNNs) or Recurrent Neural Networks (RNNs), in cyber threat detection. Employing the Edge-IIoTset cybersecurity dataset, our experimental analysis shows that SecurityBERT achieved an impressive 98.2% overall accuracy in identifying fourteen distinct attack types, surpassing previous records set by hybrid solutions such as GAN-Transformer-based architectures and CNN-LSTM models. With an inference time of less than 0.15 seconds on an average CPU and a compact model size of just 16.7MB, SecurityBERT is ideally suited for real-life traffic analysis and a suitable choice for deployment on resource-constrained IoT devices.

AIJan 23Code

AgentDrive: An Open Benchmark Dataset for Agentic AI Reasoning with LLM-Generated Scenarios in Autonomous Systems

Mohamed Amine Ferrag, Abderrahmane Lakas, Merouane Debbah

The rapid advancement of large language models (LLMs) has sparked growing interest in their integration into autonomous systems for reasoning-driven perception, planning, and decision-making. However, evaluating and training such agentic AI models remains challenging due to the lack of large-scale, structured, and safety-critical benchmarks. This paper introduces AgentDrive, an open benchmark dataset containing 300,000 LLM-generated driving scenarios designed for training, fine-tuning, and evaluating autonomous agents under diverse conditions. AgentDrive formalizes a factorized scenario space across seven orthogonal axes: scenario type, driver behavior, environment, road layout, objective, difficulty, and traffic density. An LLM-driven prompt-to-JSON pipeline generates semantically rich, simulation-ready specifications that are validated against physical and schema constraints. Each scenario undergoes simulation rollouts, surrogate safety metric computation, and rule-based outcome labeling. To complement simulation-based evaluation, we introduce AgentDrive-MCQ, a 100,000-question multiple-choice benchmark spanning five reasoning dimensions: physics, policy, hybrid, scenario, and comparative reasoning. We conduct a large-scale evaluation of fifty leading LLMs on AgentDrive-MCQ. Results show that while proprietary frontier models perform best in contextual and policy reasoning, advanced open models are rapidly closing the gap in structured and physics-grounded reasoning. We release the AgentDrive dataset, AgentDrive-MCQ benchmark, evaluation code, and related materials at https://github.com/maferrag/AgentDrive

AINov 14, 2025Code

UAVBench: An Open Benchmark Dataset for Autonomous and Agentic AI UAV Systems via LLM-Generated Flight Scenarios

Mohamed Amine Ferrag, Abderrahmane Lakas, Merouane Debbah

Autonomous aerial systems increasingly rely on large language models (LLMs) for mission planning, perception, and decision-making, yet the lack of standardized and physically grounded benchmarks limits systematic evaluation of their reasoning capabilities. To address this gap, we introduce UAVBench, an open benchmark dataset comprising 50,000 validated UAV flight scenarios generated through taxonomy-guided LLM prompting and multi-stage safety validation. Each scenario is encoded in a structured JSON schema that includes mission objectives, vehicle configuration, environmental conditions, and quantitative risk labels, providing a unified representation of UAV operations across diverse domains. Building on this foundation, we present UAVBench_MCQ, a reasoning-oriented extension containing 50,000 multiple-choice questions spanning ten cognitive and ethical reasoning styles, ranging from aerodynamics and navigation to multi-agent coordination and integrated reasoning. This framework enables interpretable and machine-checkable assessment of UAV-specific cognition under realistic operational contexts. We evaluate 32 state-of-the-art LLMs, including GPT-5, ChatGPT-4o, Gemini 2.5 Flash, DeepSeek V3, Qwen3 235B, and ERNIE 4.5 300B, and find strong performance in perception and policy reasoning but persistent challenges in ethics-aware and resource-constrained decision-making. UAVBench establishes a reproducible and physically grounded foundation for benchmarking agentic AI in autonomous aerial systems and advancing next-generation UAV reasoning intelligence. To support open science and reproducibility, we release the UAVBench dataset, the UAVBench_MCQ benchmark, evaluation scripts, and all related materials on GitHub at https://github.com/maferrag/UAVBench

CRMar 21, 2023

Poisoning Attacks in Federated Edge Learning for Digital Twin 6G-enabled IoTs: An Anticipatory Study

Mohamed Amine Ferrag, Burak Kantarci, Lucas C. Cordeiro et al.

Federated edge learning can be essential in supporting privacy-preserving, artificial intelligence (AI)-enabled activities in digital twin 6G-enabled Internet of Things (IoT) environments. However, we need to also consider the potential of attacks targeting the underlying AI systems (e.g., adversaries seek to corrupt data on the IoT devices during local updates or corrupt the model updates); hence, in this article, we propose an anticipatory study for poisoning attacks in federated edge learning for digital twin 6G-enabled IoT environments. Specifically, we study the influence of adversaries on the training and development of federated learning models in digital twin 6G-enabled IoT environments. We demonstrate that attackers can carry out poisoning attacks in two different learning settings, namely: centralized learning and federated learning, and successful attacks can severely reduce the model's accuracy. We comprehensively evaluate the attacks on a new cyber security dataset designed for IoT applications with three deep neural networks under the non-independent and identically distributed (Non-IID) data and the independent and identically distributed (IID) data. The poisoning attacks, on an attack classification problem, can lead to a decrease in accuracy from 94.93% to 85.98% with IID data and from 94.18% to 30.04% with Non-IID.

CRJul 13, 2023

SecureFalcon: Are We There Yet in Automated Software Vulnerability Detection with LLMs?

Mohamed Amine Ferrag, Ammar Battah, Norbert Tihanyi et al.

Software vulnerabilities can cause numerous problems, including crashes, data loss, and security breaches. These issues greatly compromise quality and can negatively impact the market adoption of software applications and systems. Traditional bug-fixing methods, such as static analysis, often produce false positives. While bounded model checking, a form of Formal Verification (FV), can provide more accurate outcomes compared to static analyzers, it demands substantial resources and significantly hinders developer productivity. Can Machine Learning (ML) achieve accuracy comparable to FV methods and be used in popular instant code completion frameworks in near real-time? In this paper, we introduce SecureFalcon, an innovative model architecture with only 121 million parameters derived from the Falcon-40B model and explicitly tailored for classifying software vulnerabilities. To achieve the best performance, we trained our model using two datasets, namely the FormAI dataset and the FalconVulnDB. The FalconVulnDB is a combination of recent public datasets, namely the SySeVR framework, Draper VDISC, Bigvul, Diversevul, SARD Juliet, and ReVeal datasets. These datasets contain the top 25 most dangerous software weaknesses, such as CWE-119, CWE-120, CWE-476, CWE-122, CWE-190, CWE-121, CWE-78, CWE-787, CWE-20, and CWE-762. SecureFalcon achieves 94% accuracy in binary classification and up to 92% in multiclassification, with instant CPU inference times. It outperforms existing models such as BERT, RoBERTa, CodeBERT, and traditional ML algorithms, promising to push the boundaries of software vulnerability detection and instant code completion frameworks.

100.0NIMay 2

6G Needs Agents: Toward Agentic AI-Native Networks for Autonomous Intelligence

Mohamed Amine Ferrag, Abderrahmane Lakas, Merouane Debbah

Sixth-generation (6G) networks are increasingly envisioned as AI-native infrastructures integrating communication, sensing, and computing into a unified fabric. However, existing approaches remain largely optimization-centric, relying on closed-loop control with limited reasoning capability. In this paper, we argue for a paradigm shift toward Agentic AI-Native 6G, in which Large Language Model (LLM)-based agents operate as bounded, policy-governed reasoning entities within a semantic control plane layered above deterministic 3GPP infrastructure. We propose a four-layer architecture that integrates deterministic network infrastructure, semantic abstraction of intent and context, hierarchical reasoning, and a distributed multi-agent fabric spanning device, edge, and core domains. To assess feasibility, we develop a proof-of-concept agentic reasoning and orchestration framework and conduct an extensive empirical study using a domain-specific 6G benchmark under realistic deployment constraints. Our results reveal a fundamental tradeoff between reasoning capability and system efficiency, showing that no single model simultaneously satisfies latency, throughput, and accuracy requirements. Instead, heterogeneous deployment of LLM agents across the device--edge--core continuum is necessary to balance these constraints. We further demonstrate that quantization introduces non-uniform effects across models, reinforcing the need for system-level optimization rather than model-level compression alone. These findings establish agentic intelligence as a viable architectural direction for 6G and highlight key challenges in achieving scalable, trustworthy, and self-reasoning networks. All experimental results and evaluation scripts are publicly available to support reproducibility.

CRSep 4, 2024

GenDFIR: Advancing Cyber Incident Timeline Analysis Through Retrieval Augmented Generation and Large Language Models

Fatma Yasmine Loumachi, Mohamed Chahine Ghanem, Mohamed Amine Ferrag

Cyber timeline analysis, or forensic timeline analysis, is crucial in Digital Forensics and Incident Response (DFIR). It examines artefacts and events particularly timestamps and metadata to detect anomalies, establish correlations, and reconstruct incident timelines. Traditional methods rely on structured artefacts, such as logs and filesystem metadata, using specialised tools for evidence identification and feature extraction. This paper introduces GenDFIR, a framework leveraging large language models (LLMs), specifically Llama 3.1 8B in zero shot mode, integrated with a Retrieval-Augmented Generation (RAG) agent. Incident data is preprocessed into a structured knowledge base, enabling the RAG agent to retrieve relevant events based on user prompts. The LLM interprets this context, offering semantic enrichment. Tested on synthetic data in a controlled environment, results demonstrate GenDFIR's reliability and robustness, showcasing LLMs potential to automate timeline analysis and advance threat detection.

34.4LGMay 7

VARS-FL: Validation-Aligned Client Selection for Non-IID Federated Learning in IoT Systems

Mohamed Lakas, Mohamed Amine Ferrag

Federated learning (FL) systems typically employ stateless client selection, treating each communication round independently and ignoring accumulated evidence of client contribution quality. Under non-IID data, this leads to slow convergence and unstable training, particularly when selection relies on local proxies (e.g., training loss) that are misaligned with the global optimization objective. These challenges are especially pronounced in Internet of Things (IoT) and Industrial IoT (IIoT) environments, where data is highly heterogeneous and distributed across devices observing different traffic patterns. In this paper, we propose VARS-FL (Validation-Aligned Reputation Scoring for Federated Learning), a client selection framework that quantifies each client's contribution using the reduction in server-side validation loss induced by its update. These per-round signals are aggregated into a Reputation score that combines a sliding-window average of recent contributions with a logarithmically scaled participation term, enabling robust exploration-exploitation selection. VARS-FL requires no changes to local training or aggregation and remains fully compatible with standard FedAvg. We evaluate VARS-FL on a 15-class non-IID IoT intrusion detection task using the Edge-IIoTset dataset, with 100 clients across multiple seeds, and compare it against FedAvg, Oort, and Power-of-Choice. VARS-FL consistently improves accuracy, F1-Macro, and loss, while accelerating convergence (up to 36% fewer rounds to reach 80% accuracy). These results demonstrate that validation-aligned, history-aware client selection provides a more reliable and efficient training process for federated learning in heterogeneous IoT environments.

53.4AIMay 12

LISA: Cognitive Arbitration for Signal-Free Autonomous Intersection Management

Abderrahmane Lakas, Mohamed Amine Ferrag, Merouane Debbah

Large language models (LLMs) show strong potential for Intelligent Transportation Systems (ITS), particularly in tasks requiring situational reasoning and multi-agent coordination. These capabilities make them well suited for cooperative driving, where rule-based approaches struggle in complex and dynamic traffic environments. Intersection management remains especially challenging due to conflicting right-of-way demands, heterogeneous vehicle priorities, and vehicle-specific kinematic constraints that must be resolved in real time. However, existing approaches typically use LLMs as auxiliary components on top of signal-based systems rather than as primary decision-makers. Signal controllers remain vehicle-agnostic, reservation-based methods lack intent awareness, and recent LLM-based systems still depend on signal infrastructure. In addition, LLM inference latency limits their use in sub-second control settings. We propose LISA (LLM-Based Intent-Driven Speed Advisory), a signal-free cognitive arbitration framework for autonomous intersection management. LISA uses an LLM to reason over declared vehicle intents, incorporating priority classes, queue pressure, and energy preferences. We evaluate LISA against fixed-cycle control, SCATS, AIM, and GLOSA across varying traffic loads. Results show that LISA reduces mean control delay by up to 89.1% and maintains Level of Service C while all non-LLM baselines degrade to Level of Service F. Under near-saturated demand, LISA reduces mean waiting time by 93% and peak queue length by 60.6% relative to fixed-cycle control. It also lowers fuel consumption by up to 48.8% and achieves 86.2% intent satisfaction, compared to 61.2% for the best non-LLM method. These results demonstrate that LLM-based reasoning can enable real-time, signal-free intersection management.

LGJun 18, 2025Code

I Know Which LLM Wrote Your Code Last Summer: LLM generated Code Stylometry for Authorship Attribution

Tamas Bisztray, Bilel Cherif, Richard A. Dubniczky et al.

Detecting AI-generated code, deepfakes, and other synthetic content is an emerging research challenge. As code generated by Large Language Models (LLMs) becomes more common, identifying the specific model behind each sample is increasingly important. This paper presents the first systematic study of LLM authorship attribution for C programs. We released CodeT5-Authorship, a novel model that uses only the encoder layers from the original CodeT5 encoder-decoder architecture, discarding the decoder to focus on classification. Our model's encoder output (first token) is passed through a two-layer classification head with GELU activation and dropout, producing a probability distribution over possible authors. To evaluate our approach, we introduce LLM-AuthorBench, a benchmark of 32,000 compilable C programs generated by eight state-of-the-art LLMs across diverse tasks. We compare our model to seven traditional ML classifiers and eight fine-tuned transformer models, including BERT, RoBERTa, CodeBERT, ModernBERT, DistilBERT, DeBERTa-V3, Longformer, and LoRA-fine-tuned Qwen2-1.5B. In binary classification, our model achieves 97.56% accuracy in distinguishing C programs generated by closely related models such as GPT-4.1 and GPT-4o, and 95.40% accuracy for multi-class attribution among five leading LLMs (Gemini 2.5 Flash, Claude 3.5 Haiku, GPT-4.1, Llama 3.3, and DeepSeek-V3). To support open science, we release the CodeT5-Authorship architecture, the LLM-AuthorBench benchmark, and all relevant Google Colab scripts on GitHub: https://github.com/LLMauthorbench/.

AIFeb 12, 2024

CyberMetric: A Benchmark Dataset based on Retrieval-Augmented Generation for Evaluating LLMs in Cybersecurity Knowledge

Norbert Tihanyi, Mohamed Amine Ferrag, Ridhi Jain et al.

Large Language Models (LLMs) are increasingly used across various domains, from software development to cyber threat intelligence. Understanding all the different fields of cybersecurity, which includes topics such as cryptography, reverse engineering, and risk assessment, poses a challenge even for human experts. To accurately test the general knowledge of LLMs in cybersecurity, the research community needs a diverse, accurate, and up-to-date dataset. To address this gap, we present CyberMetric-80, CyberMetric-500, CyberMetric-2000, and CyberMetric-10000, which are multiple-choice Q&A benchmark datasets comprising 80, 500, 2000, and 10,000 questions respectively. By utilizing GPT-3.5 and Retrieval-Augmented Generation (RAG), we collected documents, including NIST standards, research papers, publicly accessible books, RFCs, and other publications in the cybersecurity domain, to generate questions, each with four possible answers. The results underwent several rounds of error checking and refinement. Human experts invested over 200 hours validating the questions and solutions to ensure their accuracy and relevance, and to filter out any questions unrelated to cybersecurity. We have evaluated and compared 25 state-of-the-art LLM models on the CyberMetric datasets. In addition to our primary goal of evaluating LLMs, we involved 30 human participants to solve CyberMetric-80 in a closed-book scenario. The results can serve as a reference for comparing the general cybersecurity knowledge of humans and LLMs. The findings revealed that GPT-4o, GPT-4-turbo, Mixtral-8x7B-Instruct, Falcon-180B-Chat, and GEMINI-pro 1.0 were the best-performing LLMs. Additionally, the top LLMs were more accurate than humans on CyberMetric-80, although highly experienced human experts still outperformed small models such as Llama-3-8B, Phi-2 or Gemma-7b.

AIApr 28, 2025

From LLM Reasoning to Autonomous AI Agents: A Comprehensive Review

Mohamed Amine Ferrag, Norbert Tihanyi, Merouane Debbah

Large language models and autonomous AI agents have evolved rapidly, resulting in a diverse array of evaluation benchmarks, frameworks, and collaboration protocols. However, the landscape remains fragmented and lacks a unified taxonomy or comprehensive survey. Therefore, we present a side-by-side comparison of benchmarks developed between 2019 and 2025 that evaluate these models and agents across multiple domains. In addition, we propose a taxonomy of approximately 60 benchmarks that cover general and academic knowledge reasoning, mathematical problem-solving, code generation and software engineering, factual grounding and retrieval, domain-specific evaluations, multimodal and embodied tasks, task orchestration, and interactive assessments. Furthermore, we review AI-agent frameworks introduced between 2023 and 2025 that integrate large language models with modular toolkits to enable autonomous decision-making and multi-step reasoning. Moreover, we present real-world applications of autonomous AI agents in materials science, biomedical research, academic ideation, software engineering, synthetic data generation, chemical reasoning, mathematical problem-solving, geographic information systems, multimedia, healthcare, and finance. We then survey key agent-to-agent collaboration protocols, namely the Agent Communication Protocol (ACP), the Model Context Protocol (MCP), and the Agent-to-Agent Protocol (A2A). Finally, we discuss recommendations for future research, focusing on advanced reasoning strategies, failure modes in multi-agent LLM systems, automated scientific discovery, dynamic tool integration via reinforcement learning, integrated search capabilities, and security vulnerabilities in agent protocols.

CRMay 21, 2024

Generative AI in Cybersecurity: A Comprehensive Review of LLM Applications and Vulnerabilities

Mohamed Amine Ferrag, Fatima Alwahedi, Ammar Battah et al.

This paper provides a comprehensive review of the future of cybersecurity through Generative AI and Large Language Models (LLMs). We explore LLM applications across various domains, including hardware design security, intrusion detection, software engineering, design verification, cyber threat intelligence, malware detection, and phishing detection. We present an overview of LLM evolution and its current state, focusing on advancements in models such as GPT-4, GPT-3.5, Mixtral-8x7B, BERT, Falcon2, and LLaMA. Our analysis extends to LLM vulnerabilities, such as prompt injection, insecure output handling, data poisoning, DDoS attacks, and adversarial instructions. We delve into mitigation strategies to protect these models, providing a comprehensive look at potential attack scenarios and prevention techniques. Furthermore, we evaluate the performance of 42 LLM models in cybersecurity knowledge and hardware security, highlighting their strengths and weaknesses. We thoroughly evaluate cybersecurity datasets for LLM training and testing, covering the lifecycle from data creation to usage and identifying gaps for future research. In addition, we review new strategies for leveraging LLMs, including techniques like Half-Quadratic Quantization (HQQ), Reinforcement Learning with Human Feedback (RLHF), Direct Preference Optimization (DPO), Quantized Low-Rank Adapters (QLoRA), and Retrieval-Augmented Generation (RAG). These insights aim to enhance real-time cybersecurity defenses and improve the sophistication of LLM applications in threat detection and response. Our paper provides a foundational understanding and strategic direction for integrating LLMs into future cybersecurity frameworks, emphasizing innovation and robust model deployment to safeguard against evolving cyber threats.

CRApr 29, 2024

How secure is AI-generated Code: A Large-Scale Comparison of Large Language Models

Norbert Tihanyi, Tamas Bisztray, Mohamed Amine Ferrag et al.

This study compares state-of-the-art Large Language Models (LLMs) on their tendency to generate vulnerabilities when writing C programs using a neutral zero-shot prompt. Tihanyi et al. introduced the FormAI dataset at PROMISE'23, featuring 112,000 C programs generated by GPT-3.5-turbo, with over 51.24% identified as vulnerable. We extended that research with a large-scale study involving 9 state-of-the-art models such as OpenAI's GPT-4o-mini, Google's Gemini Pro 1.0, TII's 180 billion-parameter Falcon, Meta's 13 billion-parameter Code Llama, and several other compact models. Additionally, we introduce the FormAI-v2 dataset, which comprises 331 000 compilable C programs generated by these LLMs. Each program in the dataset is labeled based on the vulnerabilities detected in its source code through formal verification, using the Efficient SMT-based Context-Bounded Model Checker (ESBMC). This technique minimizes false positives by providing a counterexample for the specific vulnerability and reduces false negatives by thoroughly completing the verification process. Our study reveals that at least 62.07% of the generated programs are vulnerable. The differences between the models are minor, as they all show similar coding errors with slight variations. Our research highlights that while LLMs offer promising capabilities for code generation, deploying their output in a production environment requires proper risk assessment and validation.

LGMar 26, 2025

Reasoning Beyond Limits: Advances and Open Problems for LLMs

Mohamed Amine Ferrag, Norbert Tihanyi, Merouane Debbah

Recent generative reasoning breakthroughs have transformed how large language models (LLMs) tackle complex problems by dynamically retrieving and refining information while generating coherent, multi-step thought processes. Techniques such as inference-time scaling, reinforcement learning, supervised fine-tuning, and distillation have been successfully applied to models like DeepSeek-R1, OpenAI's o1 & o3, GPT-4o, Qwen-32B, and various Llama variants, resulting in enhanced reasoning capabilities. In this paper, we provide a comprehensive analysis of the top 27 LLM models released between 2023 and 2025 (including models such as Mistral AI Small 3 24B, DeepSeek-R1, Search-o1, QwQ-32B, and phi-4). Then, we present an extensive overview of training methodologies that spans general training approaches, mixture-of-experts (MoE) and architectural innovations, retrieval-augmented generation (RAG), chain-of-thought and self-improvement techniques, as well as test-time compute scaling, distillation, and reinforcement learning (RL) methods. Finally, we discuss the key challenges in advancing LLM capabilities, including improving multi-step reasoning without human supervision, overcoming limitations in chained tasks, balancing structured prompts with flexibility, and enhancing long-context retrieval and external tool integration.

AIOct 20, 2024

Dynamic Intelligence Assessment: Benchmarking LLMs on the Road to AGI with a Focus on Model Confidence

Norbert Tihanyi, Tamas Bisztray, Richard A. Dubniczky et al.

As machine intelligence evolves, the need to test and compare the problem-solving abilities of different AI models grows. However, current benchmarks are often simplistic, allowing models to perform uniformly well and making it difficult to distinguish their capabilities. Additionally, benchmarks typically rely on static question-answer pairs that the models might memorize or guess. To address these limitations, we introduce Dynamic Intelligence Assessment (DIA), a novel methodology for testing AI models using dynamic question templates and improved metrics across multiple disciplines such as mathematics, cryptography, cybersecurity, and computer science. The accompanying dataset, DIA-Bench, contains a diverse collection of challenge templates with mutable parameters presented in various formats, including text, PDFs, compiled binaries, visual puzzles, and CTF-style cybersecurity challenges. Our framework introduces four new metrics to assess a model's reliability and confidence across multiple attempts. These metrics revealed that even simple questions are frequently answered incorrectly when posed in varying forms, highlighting significant gaps in models' reliability. Notably, API models like GPT-4o often overestimated their mathematical capabilities, while ChatGPT-4o demonstrated better performance due to effective tool usage. In self-assessment, OpenAI's o1-mini proved to have the best judgement on what tasks it should attempt to solve. We evaluated 25 state-of-the-art LLMs using DIA-Bench, showing that current models struggle with complex tasks and often display unexpectedly low confidence, even with simpler questions. The DIA framework sets a new standard for assessing not only problem-solving but also a model's adaptive intelligence and ability to assess its limitations. The dataset is publicly available on the project's page: https://github.com/DIA-Bench.

CRJun 29, 2025

From Prompt Injections to Protocol Exploits: Threats in LLM-Powered AI Agents Workflows

Mohamed Amine Ferrag, Norbert Tihanyi, Djallel Hamouda et al.

Autonomous AI agents powered by large language models (LLMs) with structured function-calling interfaces have dramatically expanded capabilities for real-time data retrieval, complex computation, and multi-step orchestration. Yet, the explosive proliferation of plugins, connectors, and inter-agent protocols has outpaced discovery mechanisms and security practices, resulting in brittle integrations vulnerable to diverse threats. In this survey, we introduce the first unified, end-to-end threat model for LLM-agent ecosystems, spanning host-to-tool and agent-to-agent communications, formalize adversary capabilities and attacker objectives, and catalog over thirty attack techniques. Specifically, we organized the threat model into four domains: Input Manipulation (e.g., prompt injections, long-context hijacks, multimodal adversarial inputs), Model Compromise (e.g., prompt- and parameter-level backdoors, composite and encrypted multi-backdoors, poisoning strategies), System and Privacy Attacks (e.g., speculative side-channels, membership inference, retrieval poisoning, social-engineering simulations), and Protocol Vulnerabilities (e.g., exploits in Model Context Protocol (MCP), Agent Communication Protocol (ACP), Agent Network Protocol (ANP), and Agent-to-Agent (A2A) protocol). For each category, we review representative scenarios, assess real-world feasibility, and evaluate existing defenses. Building on our threat taxonomy, we identify key open challenges and future research directions, such as securing MCP deployments through dynamic trust management and cryptographic provenance tracking; designing and hardening Agentic Web Interfaces; and achieving resilience in multi-agent and federated environments. Our work provides a comprehensive reference to guide the design of robust defense mechanisms and establish best practices for resilient LLM-agent workflows.

SEMar 13, 2025

Vulnerability Detection: From Formal Verification to Large Language Models and Hybrid Approaches: A Comprehensive Overview

Norbert Tihanyi, Tamas Bisztray, Mohamed Amine Ferrag et al.

Software testing and verification are critical for ensuring the reliability and security of modern software systems. Traditionally, formal verification techniques, such as model checking and theorem proving, have provided rigorous frameworks for detecting bugs and vulnerabilities. However, these methods often face scalability challenges when applied to complex, real-world programs. Recently, the advent of Large Language Models (LLMs) has introduced a new paradigm for software analysis, leveraging their ability to understand insecure coding practices. Although LLMs demonstrate promising capabilities in tasks such as bug prediction and invariant generation, they lack the formal guarantees of classical methods. This paper presents a comprehensive study of state-of-the-art software testing and verification, focusing on three key approaches: classical formal methods, LLM-based analysis, and emerging hybrid techniques, which combine their strengths. We explore each approach's strengths, limitations, and practical applications, highlighting the potential of hybrid systems to address the weaknesses of standalone methods. We analyze whether integrating formal rigor with LLM-driven insights can enhance the effectiveness and scalability of software verification, exploring their viability as a pathway toward more robust and adaptive testing frameworks.

CRMar 12, 2025

CASTLE: Benchmarking Dataset for Static Code Analyzers and LLMs towards CWE Detection

Richard A. Dubniczky, Krisztofer Zoltán Horvát, Tamás Bisztray et al.

Identifying vulnerabilities in source code is crucial, especially in critical software components. Existing methods such as static analysis, dynamic analysis, formal verification, and recently Large Language Models are widely used to detect security flaws. This paper introduces CASTLE (CWE Automated Security Testing and Low-Level Evaluation), a benchmarking framework for evaluating the vulnerability detection capabilities of different methods. We assess 13 static analysis tools, 10 LLMs, and 2 formal verification tools using a hand-crafted dataset of 250 micro-benchmark programs covering 25 common CWEs. We propose the CASTLE Score, a novel evaluation metric to ensure fair comparison. Our results reveal key differences: ESBMC (a formal verification tool) minimizes false positives but struggles with vulnerabilities beyond model checking, such as weak cryptography or SQL injection. Static analyzers suffer from high false positives, increasing manual validation efforts for developers. LLMs perform exceptionally well in the CASTLE dataset when identifying vulnerabilities in small code snippets. However, their accuracy declines, and hallucinations increase as the code size grows. These results suggest that LLMs could play a pivotal role in future security solutions, particularly within code completion frameworks, where they can provide real-time guidance to prevent vulnerabilities. The dataset is accessible at https://github.com/CASTLE-Benchmark.

CROct 12, 2025

The Hidden DNA of LLM-Generated JavaScript: Structural Patterns Enable High-Accuracy Authorship Attribution

Norbert Tihanyi, Bilel Cherif, Richard A. Dubniczky et al.

In this paper, we present the first large-scale study exploring whether JavaScript code generated by Large Language Models (LLMs) can reveal which model produced it, enabling reliable authorship attribution and model fingerprinting. With the rapid rise of AI-generated code, attribution is playing a critical role in detecting vulnerabilities, flagging malicious content, and ensuring accountability. While AI-vs-human detection usually treats AI as a single category we show that individual LLMs leave unique stylistic signatures, even among models belonging to the same family or parameter size. To this end, we introduce LLM-NodeJS, a dataset of 50,000 Node.js back-end programs from 20 large language models. Each has four transformed variants, yielding 250,000 unique JavaScript samples and two additional representations (JSIR and AST) for diverse research applications. Using this dataset, we benchmark traditional machine learning classifiers against fine-tuned Transformer encoders and introduce CodeT5-JSA, a custom architecture derived from the 770M-parameter CodeT5 model with its decoder removed and a modified classification head. It achieves 95.8% accuracy on five-class attribution, 94.6% on ten-class, and 88.5% on twenty-class tasks, surpassing other tested models such as BERT, CodeBERT, and Longformer. We demonstrate that classifiers capture deeper stylistic regularities in program dataflow and structure, rather than relying on surface-level features. As a result, attribution remains effective even after mangling, comment removal, and heavy code transformations. To support open science and reproducibility, we release the LLM-NodeJS dataset, Google Colab training scripts, and all related materials on GitHub: https://github.com/LLM-NodeJS-dataset.

SEMay 24, 2023

A New Era in Software Security: Towards Self-Healing Software via Large Language Models and Formal Verification

Norbert Tihanyi, Ridhi Jain, Yiannis Charalambous et al.

This paper introduces an innovative approach that combines Large Language Models (LLMs) with Formal Verification strategies for automatic software vulnerability repair. Initially, we employ Bounded Model Checking (BMC) to identify vulnerabilities and extract counterexamples. These counterexamples are supported by mathematical proofs and the stack trace of the vulnerabilities. Using a specially designed prompt, we combine the original source code with the identified vulnerability, including its stack trace and counterexample that specifies the line number and error type. This combined information is then fed into an LLM, which is instructed to attempt to fix the code. The new code is subsequently verified again using BMC to ensure the fix succeeded. We present the ESBMC-AI framework as a proof of concept, leveraging the well-recognized and industry-adopted Efficient SMT-based Context-Bounded Model Checker (ESBMC) and a pre-trained transformer model to detect and fix errors in C programs, particularly in critical software components. We evaluated our approach on 50,000 C programs randomly selected from the FormAI dataset with their respective vulnerability classifications. Our results demonstrate ESBMC-AI's capability to automate the detection and repair of issues such as buffer overflow, arithmetic overflow, and pointer dereference failures with high accuracy. ESBMC-AI is a pioneering initiative, integrating LLMs with BMC techniques, offering potential integration into the continuous integration and deployment (CI/CD) process within the software development lifecycle.

CRDec 15, 2021

Cybersecurity Revisited: Honeytokens meet Google Authenticator

Vasilis Papaspirou, Maria Papathanasaki, Leandros Maglaras et al.

Although sufficient authentication mechanisms were enhanced by the use of two or more factors that resulted in new multi factor authentication schemes, more sophisticated and targeted attacks have shown they are also vulnerable. This research work proposes a novel two factor authentication system that incorporates honeytokens into the two factor authentication process. The current implementation collaborates with Google authenticator. The novelty and simplicity of the presented approach aims at providing additional layers of security and protection into a system and thus making it more secure through a stronger and more efficient authentication mechanism.

CRDec 16, 2020

A novel Two-Factor HoneyToken Authentication Mechanism

Vassilis Papaspirou, Leandros Maglaras, Mohamed Amine Ferrag et al.

The majority of systems rely on user authentication on passwords, but passwords have so many weaknesses and widespread use that easily raise significant security concerns, regardless of their encrypted form. Users hold the same password for different accounts, administrators never check password files for flaws that might lead to a successful cracking, and the lack of a tight security policy regarding regular password replacement are a few problems that need to be addressed. The proposed research work aims at enhancing this security mechanism, prevent penetrations, password theft, and attempted break-ins towards securing computing systems. The selected solution approach is two-folded; it implements a two-factor authentication scheme to prevent unauthorized access, accompanied by Honeyword principles to detect corrupted or stolen tokens. Both can be integrated into any platform or web application with the use of QR codes and a mobile phone.

CRJan 27, 2019

Authentication and Authorization for Mobile IoT Devices using Bio-features: Recent Advances and Future Trends

Mohamed Amine Ferrag, Leandros Maglaras, Abdelouahid Derhab

Bio-features are fast becoming a key tool to authenticate the IoT devices; in this sense, the purpose of this investigation is to summaries the factors that hinder biometrics models' development and deployment on a large scale, including human physiological (e.g., face, eyes, fingerprints-palm, or electrocardiogram) and behavioral features (e.g., signature, voice, gait, or keystroke). The different machine learning and data mining methods used by authentication and authorization schemes for mobile IoT devices are provided. Threat models and countermeasures used by biometrics-based authentication schemes for mobile IoT devices are also presented. More specifically, We analyze the state of the art of the existing biometric-based authentication schemes for IoT devices. Based on the current taxonomy, We conclude our paper with different types of challenges for future research efforts in biometrics-based authentication schemes for IoT devices.

CRJan 12, 2019

Threats, Protection and Attribution of Cyber Attacks on Critical Infrastructures

Leandros Maglaras, Mohamed Amine Ferrag, Abdelouahid Derhab et al.

As Critical National Infrastructures are becoming more vulnerable to cyber attacks, their protection becomes a significant issue for any organization as well as a nation. Moreover, the ability to attribute is a vital element of avoiding impunity in cyberspace. In this article, we present main threats to critical infrastructures along with protective measures that one nation can take, and which are classified according to legal, technical, organizational, capacity building, and cooperation aspects. Finally we provide an overview of current methods and practices regarding cyber attribution and cyber peace keeping

CRDec 21, 2018

A Novel Hierarchical Intrusion Detection System based on Decision Tree and Rules-based Models

Ahmed Ahmim, Leandros Maglaras, Mohamed Amine Ferrag et al.

This paper proposes a novel intrusion detection system (IDS) that combines different classifier approaches which are based on decision tree and rules-based concepts, namely, REP Tree, JRip algorithm and Forest PA. Specifically, the first and second method take as inputs features of the data set, and classify the network traffic as Attack/Benign. The third classifier uses features of the initial data set in addition to the outputs of the first and the second classifier as inputs. The experimental results obtained by analyzing the proposed IDS using the CICIDS2017 dataset, attest their superiority in terms of accuracy, detection rate, false alarm rate and time overhead as compared to state of the art existing schemes.

CRJun 24, 2018

Blockchain Technologies for the Internet of Things: Research Issues and Challenges

Mohamed Amine Ferrag, Makhlouf Derdour, Mithun Mukherjee et al.

This paper presents a comprehensive survey of the existing blockchain protocols for the Internet of Things (IoT) networks. We start by describing the blockchains and summarizing the existing surveys that deal with blockchain technologies. Then, we provide an overview of the application domains of blockchain technologies in IoT, e.g, Internet of Vehicles, Internet of Energy, Internet of Cloud, Fog computing, etc. Moreover, we provide a classification of threat models, which are considered by blockchain protocols in IoT networks, into five main categories, namely, identity-based attacks, manipulation-based attacks, cryptanalytic attacks, reputation-based attacks, and service-based attacks. In addition, we provide a taxonomy and a side-by-side comparison of the state-of-the-art methods towards secure and privacy-preserving blockchain technologies with respect to the blockchain model, specific security goals, performance, limitations, computation complexity, and communication overhead. Based on the current survey, we highlight open research challenges and discuss possible future research directions in the blockchain technologies for IoT.

CRMar 27, 2018

Authentication schemes for Smart Mobile Devices: Threat Models, Countermeasures, and Open Research Issues

Mohamed Amine Ferrag, Leandros Maglaras, Abdelouahid Derhab et al.

This paper presents a comprehensive investigation of authentication schemes for smart mobile devices. We start by providing an overview of existing survey articles published in the recent years that deal with security for mobile devices. Then, we describe and give a classification of threat models in smart mobile devices in five categories, including, identity-based attacks, eavesdropping-based attacks, combined eavesdropping and identity-based attacks, manipulation-based attacks, and service-based attacks. We also provide a classification of countermeasures into four types of categories, including, cryptographic functions, personal identification, classification algorithms, and channel characteristics. According to these, we categorize authentication schemes for smart mobile devices in four categories, namely, 1) biometric-based authentication schemes, 2) channel-based authentication schemes, 3) factor-based authentication schemes, and 4) ID-based authentication schemes. In addition, we provide a taxonomy and comparison of authentication schemes for smart mobile devices in the form of tables. Finally, we identify open challenges and future research directions.

CRNov 1, 2017

Internet of Cloud: Security and Privacy issues

Allan Cook, Michael Robinson, Mohamed Amine Ferrag et al.

The synergy between the cloud and the IoT has emerged largely due to the cloud having attributes which directly benefit the IoT and enable its continued growth. IoT adopting Cloud services has brought new security challenges. In this book chapter, we pursue two main goals: 1) to analyse the different components of Cloud computing and the IoT and 2) to present security and privacy problems that these systems face. We thoroughly investigate current security and privacy preservation solutions that exist in this area, with an eye on the Industrial Internet of Things, discuss open issues and propose future directions

CRAug 14, 2017

Security for 4G and 5G Cellular Networks: A Survey of Existing Authentication and Privacy-preserving Schemes

Mohamed Amine Ferrag, Leandros Maglaras, Antonios Argyriou et al.

This paper presents a comprehensive survey of existing authentication and privacy-preserving schemes for 4G and 5G cellular networks. We start by providing an overview of existing surveys that deal with 4G and 5G communications, applications, standardization, and security. Then, we give a classification of threat models in 4G and 5G cellular networks in four categories, including, attacks against privacy, attacks against integrity, attacks against availability, and attacks against authentication. We also provide a classification of countermeasures into three types of categories, including, cryptography methods, humans factors, and intrusion detection methods. The countermeasures and informal and formal security analysis techniques used by the authentication and privacy preserving schemes are summarized in form of tables. Based on the categorization of the authentication and privacy models, we classify these schemes in seven types, including, handover authentication with privacy, mutual authentication with privacy, RFID authentication with privacy, deniable authentication with privacy, authentication with mutual anonymity, authentication and key agreement with privacy, and three-factor authentication with privacy. In addition, we provide a taxonomy and comparison of authentication and privacy-preserving schemes for 4G and 5G cellular networks in form of tables. Based on the current survey, several recommendations for further research are discussed at the end of this paper.

CRDec 21, 2016

Authentication Protocols for Internet of Things: A Comprehensive Survey

Mohamed Amine Ferrag, Leandros A. Maglaras, Helge Janicke et al.

In this paper, we present a comprehensive survey of authentication protocols for Internet of Things (IoT). Specifically, we select and in-detail examine more than forty authentication protocols developed for or applied in the context of the IoT under four environments, including: (1) Machine to machine communications (M2M), (2) Internet of Vehicles (IoV), (3) Internet of Energy (IoE), and (4) Internet of Sensors (IoS). We start by reviewing all survey articles published in the recent years that focusing on different aspects of the IoT idea. Then, we review threat models, countermeasures, and formal security verification techniques used in authentication protocols for the IoT. In addition, we provide a taxonomy and comparison of authentication protocols for the IoT in form of tables in five terms, namely, network model, goals, main processes, computation complexity, and communication overhead. Based on the current survey, we identify open issues and suggest hints for future research.

CRNov 23, 2016

A Survey on Privacy-preserving Schemes for Smart Grid Communications

Mohamed Amine Ferrag, Leandros A. Maglaras, Helge Janicke et al.

In this paper, we present a comprehensive survey of privacy-preserving schemes for Smart Grid communications. Specifically, we select and in-detail examine thirty privacy preserving schemes developed for or applied in the context of Smart Grids. Based on the communication and system models, we classify these schemes that are published between 2013 and 2016, in five categories, including, 1) Smart grid with the advanced metering infrastructure, 2) Data aggregation communications, 3) Smart grid marketing architecture, 4) Smart community of home gateways, and 5) Vehicle-to grid architecture. For each scheme, we survey the attacks of leaking privacy, countermeasures, and game theoretic approaches. In addition, we review the survey articles published in the recent years that deal with Smart Grids communications, applications, standardization, and security. Based on the current survey, several recommendations for further research are discussed at the end of this paper.

CROct 19, 2016

Privacy-preserving schemes for Ad Hoc Social Networks: A survey

Mohamed Amine Ferrag, Leandros Maglaras, Ahmed Ahmim

In this paper, we review the state of the art of privacy-preserving schemes for ad hoc social networks, including, mobile social networks (MSNs) and vehicular social networks (VSNs). Specifically, we select and in-detail examine thirty-three privacy preserving schemes developed for or applied in the context of ad hoc social networks. These schemes are published between 2008 and 2016. Based on this existing privacy preservation schemes, we survey privacy preservation models, including location privacy, identity privacy, anonymity, traceability, interest privacy, backward privacy, and content oriented privacy. The recent important attacks of leaking privacy, countermeasures, and game theoretic approaches in VSNs and MSNs are summarized in form of tables. In addition, an overview of recommendations for further research is also provided. With this survey, readers can have a more thorough understanding of research trends in privacy-preserving schemes for ad hoc social networks