CRNov 2, 2023Code
Artificial Intelligence Ethics Education in Cybersecurity: Challenges and Opportunities: a focus group reportDiane Jackson, Sorin Adam Matei, Elisa Bertino
The emergence of AI tools in cybersecurity creates many opportunities and uncertainties. A focus group with advanced graduate students in cybersecurity revealed the potential depth and breadth of the challenges and opportunities. The salient issues are access to open source or free tools, documentation, curricular diversity, and clear articulation of ethical principles for AI cybersecurity education. Confronting the "black box" mentality in AI cybersecurity work is also of the greatest importance, doubled by deeper and prior education in foundational AI work. Systems thinking and effective communication were considered relevant areas of educational improvement. Future AI educators and practitioners need to address these issues by implementing rigorous technical training curricula, clear documentation, and frameworks for ethically monitoring AI combined with critical and system's thinking and communication skills.
CRJul 18, 2024Code
CellularLint: A Systematic Approach to Identify Inconsistent Behavior in Cellular Network SpecificationsMirza Masfiqur Rahman, Imtiaz Karim, Elisa Bertino
In recent years, there has been a growing focus on scrutinizing the security of cellular networks, often attributing security vulnerabilities to issues in the underlying protocol design descriptions. These protocol design specifications, typically extensive documents that are thousands of pages long, can harbor inaccuracies, underspecifications, implicit assumptions, and internal inconsistencies. In light of the evolving landscape, we introduce CellularLint--a semi-automatic framework for inconsistency detection within the standards of 4G and 5G, capitalizing on a suite of natural language processing techniques. Our proposed method uses a revamped few-shot learning mechanism on domain-adapted large language models. Pre-trained on a vast corpus of cellular network protocols, this method enables CellularLint to simultaneously detect inconsistencies at various levels of semantics and practical use cases. In doing so, CellularLint significantly advances the automated analysis of protocol specifications in a scalable fashion. In our investigation, we focused on the Non-Access Stratum (NAS) and the security specifications of 4G and 5G networks, ultimately uncovering 157 inconsistencies with 82.67% accuracy. After verification of these inconsistencies on open-source implementations and 17 commercial devices, we confirm that they indeed have a substantial impact on design decisions, potentially leading to concerns related to privacy, integrity, availability, and interoperability.
18.0CRMay 5
Quantum-Resistant Networks: A Review of Primitives, Protocols and Best PracticesElisa Bertino, Ramana Kompella, Ashish Kundu et al.
Large-scale quantum computers threaten the public-key cryptographic foundations underpinning today's network security infrastructures. While significant progress has been made in standardizing post-quantum cryptographic (PQC) primitives and adapting individual protocols such as TLS and SSH, far less attention has been paid to the broader architectural consequences of the post-quantum transition for networked systems. In particular, many real-world deployments such as mobile networks, industrial control systems, IoT environments, and regulated infrastructures cannot assume the universal availability, deployability, or desirability of PQ public-key infrastructures. This paper presents the first comprehensive systematization of PQ-resistant network architectures, focusing on key distribution and management as a system-level design problem rather than a protocol-local substitution. We introduce a unified taxonomy spanning cryptographic foundations (symmetric-only, PQ-PKI, hybrid, and information-theoretic multi-path), key-distribution architectures (centralized, hierarchical, replicated, threshold, MPC-backed, and serverless), trust and threat models, key-management lifecycle, and deployment environments. Using this framework, we analyze the security, scalability, and operational trade-offs of a wide range of architectures under realistic PQ adversary assumptions, including harvest-now, decrypt-later attacks and partial infrastructure compromise. Our study highlights fundamental gaps in existing approaches, clarifies when PQ-PKI is necessary or avoidable, and identifies promising research directions for building cryptographically agile, quantum-resilient network infrastructures.
LGJun 23, 2023
TrustGuard: GNN-based Robust and Explainable Trust Evaluation with Dynamicity SupportJie Wang, Zheng Yan, Jiahe Lan et al.
Trust evaluation assesses trust relationships between entities and facilitates decision-making. Machine Learning (ML) shows great potential for trust evaluation owing to its learning capabilities. In recent years, Graph Neural Networks (GNNs), as a new ML paradigm, have demonstrated superiority in dealing with graph data. This has motivated researchers to explore their use in trust evaluation, as trust relationships among entities can be modeled as a graph. However, current trust evaluation methods that employ GNNs fail to fully satisfy the dynamic nature of trust, overlook the adverse effects of trust-related attacks, and cannot provide convincing explanations on evaluation results. To address these problems, we propose TrustGuard, a GNN-based accurate trust evaluation model that supports trust dynamicity, is robust against typical attacks, and provides explanations through visualization. Specifically, TrustGuard is designed with a layered architecture that contains a snapshot input layer, a spatial aggregation layer, a temporal aggregation layer, and a prediction layer. Among them, the spatial aggregation layer adopts a defense mechanism to robustly aggregate local trust, and the temporal aggregation layer applies an attention mechanism for effective learning of temporal patterns. Extensive experiments on two real-world datasets show that TrustGuard outperforms state-of-the-art GNN-based trust evaluation models with respect to trust prediction across single-timeslot and multi-timeslot, even in the presence of attacks. In addition, TrustGuard can explain its evaluation results by visualizing both spatial and temporal views.
74.3LGMay 6
Information Theoretic Adversarial Training of Large Language ModelsYiwei Zhang, Jeremiah Birrell, Reza Ebrahimi et al.
Large language models (LLMs) remain vulnerable to adversarial prompting despite advances in alignment and safety, often exhibiting harmful behaviors under novel attack strategies. While adversarial training can improve robustness, existing approaches are computationally expensive and difficult to scale. Recent continuous adversarial training methods, such as Continuous adversarial training (CAT) and Continuous Adversarial Preference Optimization (CAPO), address this challenge by leveraging gradient-based perturbations in the embedding space, enabling more efficient and expressive attacks. Building on this paradigm, we propose WARDEN, a distributionally robust adversarial training framework for LLMs that dynamically reweights adversarial examples through an f -divergence ambiguity set around the empirical training distribution. Our method optimizes the worst-case adversarial loss within a divergence ball around the empirical data distribution, automatically emphasizing harder adversarial examples. Using the convex dual formulation, the objective reduces to a log-sum-exp form under the KL divergence, with a dynamical parameter controlling the strength of reweighting. This study leads to a new class of information-theoretic objectives that significantly reduce attack success rates while maintaining model utility. Across multiple LLMs and attack settings, WARDEN substantially reduces attack success rates with computational and utility costs comparable to CAT-, CAPO-, and MixAT-based baselines, making it a practical approach for scalable robust alignment.
CYNov 7, 2023
Educating for AI Cybersecurity Work and Research: Ethics, Systems Thinking, and Communication RequirementsSorin Adam Matei, Elisa Bertino
The present study explored managerial and instructor perceptions of their freshly employed cybersecurity workers' or students' preparedness to work effectively in a changing cybersecurity environment that includes AI tools. Specifically, we related perceptions of technical preparedness to ethical, systems thinking, and communication skills. We found that managers and professors perceive preparedness to use AI tools in cybersecurity to be significantly associated with all three non-technical skill sets. Most important, ethics is a clear leader in the network of relationships. Contrary to expectations that ethical concerns are left behind in the rush to adopt the most advanced AI tools in security, both higher education instructors and managers appreciate their role and see them closely associated with technical prowess. Another significant finding is that professors over-estimate students' preparedness for ethical, system thinking, and communication abilities compared to IT managers' perceptions of their newly employed IT workers.
IRJan 22, 2023
SPEC5G: A Dataset for 5G Cellular Network Protocol AnalysisImtiaz Karim, Kazi Samin Mubasshir, Mirza Masfiqur Rahman et al.
5G is the 5th generation cellular network protocol. It is the state-of-the-art global wireless standard that enables an advanced kind of network designed to connect virtually everyone and everything with increased speed and reduced latency. Therefore, its development, analysis, and security are critical. However, all approaches to the 5G protocol development and security analysis, e.g., property extraction, protocol summarization, and semantic analysis of the protocol specifications and implementations are completely manual. To reduce such manual effort, in this paper, we curate SPEC5G the first-ever public 5G dataset for NLP research. The dataset contains 3,547,586 sentences with 134M words, from 13094 cellular network specifications and 13 online websites. By leveraging large-scale pre-trained language models that have achieved state-of-the-art results on NLP tasks, we use this dataset for security-related text classification and summarization. Security-related text classification can be used to extract relevant security-related properties for protocol testing. On the other hand, summarization can help developers and practitioners understand the high level of the protocol, which is itself a daunting task. Our results show the value of our 5G-centric dataset in 5G protocol analysis automation. We believe that SPEC5G will enable a new research direction into automatic analyses for the 5G cellular network protocol and numerous related downstream tasks. Our data and code are publicly available.
CVJun 1, 2023
Maximizing Information in Domain-Invariant Representation Improves Transfer LearningAdrian Shuai Li, Elisa Bertino, Xuan-Hong Dang et al.
We propose MaxDIRep, a domain adaptation method that improves the decomposition of data representations into domain-independent and domain-dependent components. Existing methods, such as Domain-Separation Networks (DSN), use a weak orthogonality constraint between these components, which can lead to label-relevant features being partially encoded in the domain-dependent representation (DDRep) rather than the domain-independent representation (DIRep). As a result, information crucial for target-domain classification may be missing from the DIRep. MaxDIRep addresses this issue by applying a Kullback-Leibler (KL) divergence constraint to minimize the information content of the DDRep, thereby encouraging the DIRep to retain features that are both domain-invariant and predictive of target labels. Through geometric analysis and an ablation study on synthetic datasets, we show why DSN's weaker constraint can lead to suboptimal adaptation. Experiments on standard image benchmarks and a network intrusion detection task demonstrate that MaxDIRep achieves strong performance, works with pretrained models, and generalizes to non-image classification tasks.
CRJan 8
A Survey of Agentic AI and Cybersecurity: Challenges, Opportunities and Use-case PrototypesSahaya Jestus Lazer, Kshitiz Aryal, Maanak Gupta et al.
Agentic AI marks an important transition from single-step generative models to systems capable of reasoning, planning, acting, and adapting over long-lasting tasks. By integrating memory, tool use, and iterative decision cycles, these systems enable continuous, autonomous workflows in real-world environments. This survey examines the implications of agentic AI for cybersecurity. On the defensive side, agentic capabilities enable continuous monitoring, autonomous incident response, adaptive threat hunting, and fraud detection at scale. Conversely, the same properties amplify adversarial power by accelerating reconnaissance, exploitation, coordination, and social-engineering attacks. These dual-use dynamics expose fundamental gaps in existing governance, assurance, and accountability mechanisms, which were largely designed for non-autonomous and short-lived AI systems. To address these challenges, we survey emerging threat models, security frameworks, and evaluation pipelines tailored to agentic systems, and analyze systemic risks including agent collusion, cascading failures, oversight evasion, and memory poisoning. Finally, we present three representative use-case implementations that illustrate how agentic AI behaves in practical cybersecurity workflows, and how design choices shape reliability, safety, and operational effectiveness.
38.1CRApr 8
Can Drift-Adaptive Malware Detectors Be Made Robust? Attacks and Defenses Under White-Box and Black-Box ThreatsAdrian Shuai Li, Md Ajwad Akil, Elisa Bertino
Concept drift and adversarial evasion are two major challenges for deploying machine learning-based malware detectors. While both have been studied separately, their combination, the adversarial robustness of drift-adaptive detectors, remains unexplored. We address this problem with AdvDA, a recent malware detector that uses adversarial domain adaptation to align a labeled source domain with a target domain with limited labels. The distribution shift between domains poses a unique challenge: robustness learned on the source may not transfer to the target, and existing defenses assume a fixed distribution. To address this, we propose a universal robustification framework that fine-tunes a pretrained AdvDA model on adversarially transformed inputs, agnostic to the attack type and choice of transformations. We instantiate it with five defense variants spanning two threat models: white-box PGD attacks in the feature space and black-box MalGuise attacks that modify malware binaries via functionality-preserving control-flow mutations. Across nine defense configurations, five monthly adaptation windows on Windows malware, and three false-positive-rate operating points, we find the undefended AdvDA completely vulnerable to PGD (100% attack success) and moderately to MalGuise (13%). Our framework reduces these rates to as low as 3.2% and 5.1%, respectively, but the optimal strategy differs: source adversarial training is essential for PGD defenses yet counterproductive for MalGuise defenses, where target-only training suffices. Furthermore, robustness does not transfer across these two threat models. We provide deployment recommendations that balance robustness, detection accuracy, and computational cost.
LGDec 12, 2025
CAT: Can Trust be Predicted with Context-Awareness in Dynamic Heterogeneous Networks?Jie Wang, Zheng Yan, Jiahe Lan et al.
Trust prediction provides valuable support for decision-making, risk mitigation, and system security enhancement. Recently, Graph Neural Networks (GNNs) have emerged as a promising approach for trust prediction, owing to their ability to learn expressive node representations that capture intricate trust relationships within a network. However, current GNN-based trust prediction models face several limitations: (i) Most of them fail to capture trust dynamicity, leading to questionable inferences. (ii) They rarely consider the heterogeneous nature of real-world networks, resulting in a loss of rich semantics. (iii) None of them support context-awareness, a basic property of trust, making prediction results coarse-grained. To this end, we propose CAT, the first Context-Aware GNN-based Trust prediction model that supports trust dynamicity and accurately represents real-world heterogeneity. CAT consists of a graph construction layer, an embedding layer, a heterogeneous attention layer, and a prediction layer. It handles dynamic graphs using continuous-time representations and captures temporal information through a time encoding function. To model graph heterogeneity and leverage semantic information, CAT employs a dual attention mechanism that identifies the importance of different node types and nodes within each type. For context-awareness, we introduce a new notion of meta-paths to extract contextual features. By constructing context embeddings and integrating a context-aware aggregator, CAT can predict both context-aware trust and overall trust. Extensive experiments on three real-world datasets demonstrate that CAT outperforms five groups of baselines in trust prediction, while exhibiting strong scalability to large-scale graphs and robustness against both trust-oriented and GNN-oriented attacks.
CRDec 15, 2023
FlowMur: A Stealthy and Practical Audio Backdoor Attack with Limited KnowledgeJiahe Lan, Jie Wang, Baochen Yan et al.
Speech recognition systems driven by DNNs have revolutionized human-computer interaction through voice interfaces, which significantly facilitate our daily lives. However, the growing popularity of these systems also raises special concerns on their security, particularly regarding backdoor attacks. A backdoor attack inserts one or more hidden backdoors into a DNN model during its training process, such that it does not affect the model's performance on benign inputs, but forces the model to produce an adversary-desired output if a specific trigger is present in the model input. Despite the initial success of current audio backdoor attacks, they suffer from the following limitations: (i) Most of them require sufficient knowledge, which limits their widespread adoption. (ii) They are not stealthy enough, thus easy to be detected by humans. (iii) Most of them cannot attack live speech, reducing their practicality. To address these problems, in this paper, we propose FlowMur, a stealthy and practical audio backdoor attack that can be launched with limited knowledge. FlowMur constructs an auxiliary dataset and a surrogate model to augment adversary knowledge. To achieve dynamicity, it formulates trigger generation as an optimization problem and optimizes the trigger over different attachment positions. To enhance stealthiness, we propose an adaptive data poisoning method according to Signal-to-Noise Ratio (SNR). Furthermore, ambient noise is incorporated into the process of trigger generation and data poisoning to make FlowMur robust to ambient noise and improve its practicality. Extensive experiments conducted on two datasets demonstrate that FlowMur achieves high attack performance in both digital and physical settings while remaining resilient to state-of-the-art defenses. In particular, a human study confirms that triggers generated by FlowMur are not easily detected by participants.
CRMay 29, 2025
LLM Agents Should Employ Security PrinciplesKaiyuan Zhang, Zian Su, Pin-Yu Chen et al.
Large Language Model (LLM) agents show considerable promise for automating complex tasks using contextual reasoning; however, interactions involving multiple agents and the system's susceptibility to prompt injection and other forms of context manipulation introduce new vulnerabilities related to privacy leakage and system exploitation. This position paper argues that the well-established design principles in information security, which are commonly referred to as security principles, should be employed when deploying LLM agents at scale. Design principles such as defense-in-depth, least privilege, complete mediation, and psychological acceptability have helped guide the design of mechanisms for securing information systems over the last five decades, and we argue that their explicit and conscientious adoption will help secure agentic systems. To illustrate this approach, we introduce AgentSandbox, a conceptual framework embedding these security principles to provide safeguards throughout an agent's life-cycle. We evaluate with state-of-the-art LLMs along three dimensions: benign utility, attack utility, and attack success rate. AgentSandbox maintains high utility for its intended functions under both benign and adversarial evaluations while substantially mitigating privacy risks. By embedding secure design principles as foundational elements within emerging LLM agent protocols, we aim to promote trustworthy agent ecosystems aligned with user privacy expectations and evolving regulatory requirements.
CRMar 1, 2024
Transfer Learning for Security: Challenges and Future DirectionsAdrian Shuai Li, Arun Iyengar, Ashish Kundu et al.
Many machine learning and data mining algorithms rely on the assumption that the training and testing data share the same feature space and distribution. However, this assumption may not always hold. For instance, there are situations where we need to classify data in one domain, but we only have sufficient training data available from a different domain. The latter data may follow a distinct distribution. In such cases, successfully transferring knowledge across domains can significantly improve learning performance and reduce the need for extensive data labeling efforts. Transfer learning (TL) has thus emerged as a promising framework to tackle this challenge, particularly in security-related tasks. This paper aims to review the current advancements in utilizing TL techniques for security. The paper includes a discussion of the existing research gaps in applying TL in the security domain, as well as exploring potential future research directions and issues that arise in the context of TL-assisted security solutions.
CROct 13, 2024
Uncovering Attacks and Defenses in Secure Aggregation for Federated Deep LearningYiwei Zhang, Rouzbeh Behnia, Attila A. Yavuz et al.
Federated learning enables the collaborative learning of a global model on diverse data, preserving data locality and eliminating the need to transfer user data to a central server. However, data privacy remains vulnerable, as attacks can target user training data by exploiting the updates sent by users during each learning iteration. Secure aggregation protocols are designed to mask/encrypt user updates and enable a central server to aggregate the masked information. MicroSecAgg (PoPETS 2024) proposes a single server secure aggregation protocol that aims to mitigate the high communication complexity of the existing approaches by enabling a one-time setup of the secret to be re-used in multiple training iterations. In this paper, we identify a security flaw in the MicroSecAgg that undermines its privacy guarantees. We detail the security flaw and our attack, demonstrating how an adversary can exploit predictable masking values to compromise user privacy. Our findings highlight the critical need for enhanced security measures in secure aggregation protocols, particularly the implementation of dynamic and unpredictable masking strategies. We propose potential countermeasures to mitigate these vulnerabilities and ensure robust privacy protection in the secure aggregation frameworks.
NIDec 17, 2024
TIMESAFE: Timing Interruption Monitoring and Security Assessment for Fronthaul EnvironmentsJoshua Groen, Simone Di Valerio, Imtiaz Karim et al.
5G and beyond cellular systems embrace the disaggregation of Radio Access Network (RAN) components, exemplified by the evolution of the fronthaul (FH) connection between cellular baseband and radio unit equipment. Crucially, synchronization over the FH is pivotal for reliable 5G services. In recent years, there has been a push to move these links to an Ethernet-based packet network topology, leveraging existing standards and ongoing research for Time-Sensitive Networking (TSN). However, TSN standards, such as Precision Time Protocol (PTP), focus on performance with little to no concern for security. This increases the exposure of the open FH to security risks. Attacks targeting synchronization mechanisms pose significant threats, potentially disrupting 5G networks and impairing connectivity. In this paper, we demonstrate the impact of successful spoofing and replay attacks against PTP synchronization. We show how a spoofing attack is able to cause a production-ready O-RAN and 5G-compliant private cellular base station to catastrophically fail within 2 seconds of the attack, necessitating manual intervention to restore full network operations. To counter this, we design a Machine Learning (ML)-based monitoring solution capable of detecting various malicious attacks with over 97.5% accuracy.
CRDec 20, 2023
Graphene: Infrastructure Security Posture Analysis with AI-generated Attack GraphsXin Jin, Charalampos Katsis, Fan Sang et al.
The rampant occurrence of cybersecurity breaches imposes substantial limitations on the progress of network infrastructures, leading to compromised data, financial losses, potential harm to individuals, and disruptions in essential services. The current security landscape demands the urgent development of a holistic security assessment solution that encompasses vulnerability analysis and investigates the potential exploitation of these vulnerabilities as attack paths. In this paper, we propose Graphene, an advanced system designed to provide a detailed analysis of the security posture of computing infrastructures. Using user-provided information, such as device details and software versions, Graphene performs a comprehensive security assessment. This assessment includes identifying associated vulnerabilities and constructing potential attack graphs that adversaries can exploit. Furthermore, Graphene evaluates the exploitability of these attack paths and quantifies the overall security posture through a scoring mechanism. The system takes a holistic approach by analyzing security layers encompassing hardware, system, network, and cryptography. Furthermore, Graphene delves into the interconnections between these layers, exploring how vulnerabilities in one layer can be leveraged to exploit vulnerabilities in others. In this paper, we present the end-to-end pipeline implemented in Graphene, showcasing the systematic approach adopted for conducting this thorough security analysis.
LGMar 12, 2025
How Feasible is Augmenting Fake Nodes with Learnable Features as a Counter-strategy against Link Stealing Attacks?Mir Imtiaz Mostafiz, Imtiaz Karim, Elisa Bertino
Graph Neural Networks (GNNs) are widely used and deployed for graph-based prediction tasks. However, as good as GNNs are for learning graph data, they also come with the risk of privacy leakage. For instance, an attacker can run carefully crafted queries on the GNNs and, from the responses, can infer the existence of an edge between a pair of nodes. This attack, dubbed as a "link-stealing" attack, can jeopardize the user's privacy by leaking potentially sensitive information. To protect against this attack, we propose an approach called "$(N)$ode $(A)$ugmentation for $(R)$estricting $(G)$raphs from $(I)$nsinuating their $(S)$tructure" ($NARGIS$) and study its feasibility. $NARGIS$ is focused on reshaping the graph embedding space so that the posterior from the GNN model will still provide utility for the prediction task but will introduce ambiguity for the link-stealing attackers. To this end, $NARGIS$ applies spectral clustering on the given graph to facilitate it being augmented with new nodes -- that have learned features instead of fixed ones. It utilizes tri-level optimization for learning parameters for the GNN model, surrogate attacker model, and our defense model (i.e. learnable node features). We extensively evaluate $NARGIS$ on three benchmark citation datasets over eight knowledge availability settings for the attackers. We also evaluate the model fidelity and defense performance on influence-based link inference attacks. Through our studies, we have figured out the best feature of $NARGIS$ -- its superior fidelity-privacy performance trade-off in a significant number of cases. We also have discovered in which cases the model needs to be improved, and proposed ways to integrate different schemes to make the model more robust against link stealing attacks.
LGOct 23, 2024
Adversarial Domain Adaptation for Metal Cutting Sound Detection: Leveraging Abundant Lab Data for Scarce Industry DataMir Imtiaz Mostafiz, Eunseob Kim, Adrian Shuai Li et al.
Cutting state monitoring in the milling process is crucial for improving manufacturing efficiency and tool life. Cutting sound detection using machine learning (ML) models, inspired by experienced machinists, can be employed as a cost-effective and non-intrusive monitoring method in a complex manufacturing environment. However, labeling industry data for training is costly and time-consuming. Moreover, industry data is often scarce. In this study, we propose a novel adversarial domain adaptation (DA) approach to leverage abundant lab data to learn from scarce industry data, both labeled, for training a cutting-sound detection model. Rather than adapting the features from separate domains directly, we project them first into two separate latent spaces that jointly work as the feature space for learning domain-independent representations. We also analyze two different mechanisms for adversarial learning where the discriminator works as an adversary and a critic in separate settings, enabling our model to learn expressive domain-invariant and domain-ingrained features, respectively. We collected cutting sound data from multiple sensors in different locations, prepared datasets from lab and industry domain, and evaluated our learning models on them. Experiments showed that our models outperformed the multi-layer perceptron based vanilla domain adaptation models in labeling tasks on the curated datasets, achieving near 92%, 82% and 85% accuracy respectively for three different sensors installed in industry settings.
CVMay 31, 2023
Building Manufacturing Deep Learning Models with Minimal and Imbalanced Training Data Using Domain Adaptation and Data AugmentationAdrian Shuai Li, Elisa Bertino, Rih-Teng Wu et al.
Deep learning (DL) techniques are highly effective for defect detection from images. Training DL classification models, however, requires vast amounts of labeled data which is often expensive to collect. In many cases, not only the available training data is limited but may also imbalanced. In this paper, we propose a novel domain adaptation (DA) approach to address the problem of labeled training data scarcity for a target learning task by transferring knowledge gained from an existing source dataset used for a similar learning task. Our approach works for scenarios where the source dataset and the dataset available for the target learning task have same or different feature spaces. We combine our DA approach with an autoencoder-based data augmentation approach to address the problem of imbalanced target datasets. We evaluate our combined approach using image data for wafer defect prediction. The experiments show its superior performance against other algorithms when the number of labeled samples in the target dataset is significantly small and the target dataset is imbalanced.
CRJan 23, 2022
Are Your Sensitive Attributes Private? Novel Model Inversion Attribute Inference Attacks on Classification ModelsShagufta Mehnaz, Sayanton V. Dibbo, Ehsanul Kabir et al.
Increasing use of machine learning (ML) technologies in privacy-sensitive domains such as medical diagnoses, lifestyle predictions, and business decisions highlights the need to better understand if these ML technologies are introducing leakage of sensitive and proprietary training data. In this paper, we focus on model inversion attacks where the adversary knows non-sensitive attributes about records in the training data and aims to infer the value of a sensitive attribute unknown to the adversary, using only black-box access to the target classification model. We first devise a novel confidence score-based model inversion attribute inference attack that significantly outperforms the state-of-the-art. We then introduce a label-only model inversion attack that relies only on the model's predicted labels but still matches our confidence score-based attack in terms of attack effectiveness. We also extend our attacks to the scenario where some of the other (non-sensitive) attributes of a target record are unknown to the adversary. We evaluate our attacks on two types of machine learning models, decision tree and deep neural network, trained on three real datasets. Moreover, we empirically demonstrate the disparate vulnerability of model inversion attacks, i.e., specific groups in the training dataset (grouped by gender, race, etc.) could be more vulnerable to model inversion attacks.
RODec 7, 2021
Control Parameters Considered Harmful: Detecting Range Specification Bugs in Drone Configuration Modules via Learning-Guided SearchRuidong Han, Chao Yang, Siqi Ma et al.
In order to support a variety of missions and deal with different flight environments, drone control programs typically provide configurable control parameters. However, such a flexibility introduces vulnerabilities. One such vulnerability, referred to as range specification bugs, has been recently identified. The vulnerability originates from the fact that even though each individual parameter receives a value in the recommended value range, certain combinations of parameter values may affect the drone physical stability. In this paper we develop a novel learning-guided search system to find such combinations, that we refer to as incorrect configurations. Our system applies metaheuristic search algorithms mutating configurations to detect the configuration parameters that have values driving the drone to unstable physical states. To guide the mutations, our system leverages a machine learning predictor as the fitness evaluator. Finally, by utilizing multi-objective optimization, our system returns the feasible ranges based on the mutation search results. Because in our system the mutations are guided by a predictor, evaluating the parameter configurations does not require realistic/simulation executions. Therefore, our system supports a comprehensive and yet efficient detection of incorrect configurations. We have carried out an experimental evaluation of our system. The evaluation results show that the system successfully reports potentially incorrect configurations, of which over 85% lead to actual unstable physical states.
LGMar 13, 2021
Simeon -- Secure Federated Machine Learning Through Iterative FilteringNicholas Malecki, Hye-young Paik, Aleksandar Ignjatovic et al.
Federated learning enables a global machine learning model to be trained collaboratively by distributed, mutually non-trusting learning agents who desire to maintain the privacy of their training data and their hardware. A global model is distributed to clients, who perform training, and submit their newly-trained model to be aggregated into a superior model. However, federated learning systems are vulnerable to interference from malicious learning agents who may desire to prevent training or induce targeted misclassification in the resulting global model. A class of Byzantine-tolerant aggregation algorithms has emerged, offering varying degrees of robustness against these attacks, often with the caveat that the number of attackers is bounded by some quantity known prior to training. This paper presents Simeon: a novel approach to aggregation that applies a reputation-based iterative filtering technique to achieve robustness even in the presence of attackers who can exhibit arbitrary behaviour. We compare Simeon to state-of-the-art aggregation techniques and find that Simeon achieves comparable or superior robustness to a variety of attacks. Notably, we show that Simeon is tolerant to sybil attacks, where other algorithms are not, presenting a key advantage of our approach.
CRMar 6, 2021
Fine with "1234"? An Analysis of SMS One-Time Password Randomness in Android AppsSiqi Ma, Juanru Li, Hyoungshick Kim et al.
A fundamental premise of SMS One-Time Password (OTP) is that the used pseudo-random numbers (PRNs) are uniquely unpredictable for each login session. Hence, the process of generating PRNs is the most critical step in the OTP authentication. An improper implementation of the pseudo-random number generator (PRNG) will result in predictable or even static OTP values, making them vulnerable to potential attacks. In this paper, we present a vulnerability study against PRNGs implemented for Android apps. A key challenge is that PRNGs are typically implemented on the server-side, and thus the source code is not accessible. To resolve this issue, we build an analysis tool, \sysname, to assess implementations of the PRNGs in an automated manner without the source code requirement. Through reverse engineering, \sysname identifies the apps using SMS OTP and triggers each app's login functionality to retrieve OTP values. It further assesses the randomness of the OTP values to identify vulnerable PRNGs. By analyzing 6,431 commercially used Android apps downloaded from \tool{Google Play} and \tool{Tencent Myapp}, \sysname identified 399 vulnerable apps that generate predictable OTP values. Even worse, 194 vulnerable apps use the OTP authentication alone without any additional security mechanisms, leading to insecure authentication against guessing attacks and replay attacks.
CYDec 10, 2020
Artificial Intelligence & CooperationElisa Bertino, Finale Doshi-Velez, Maria Gini et al.
The rise of Artificial Intelligence (AI) will bring with it an ever-increasing willingness to cede decision-making to machines. But rather than just giving machines the power to make decisions that affect us, we need ways to work cooperatively with AI systems. There is a vital need for research in "AI and Cooperation" that seeks to understand the ways in which systems of AIs and systems of AIs with people can engender cooperative behavior. Trust in AI is also key: trust that is intrinsic and trust that can only be earned over time. Here we use the term "AI" in its broadest sense, as employed by the recent 20-Year Community Roadmap for AI Research (Gil and Selman, 2019), including but certainly not limited to, recent advances in deep learning. With success, cooperation between humans and AIs can build society just as human-human cooperation has. Whether coming from an intrinsic willingness to be helpful, or driven through self-interest, human societies have grown strong and the human species has found success through cooperation. We cooperate "in the small" -- as family units, with neighbors, with co-workers, with strangers -- and "in the large" as a global community that seeks cooperative outcomes around questions of commerce, climate change, and disarmament. Cooperation has evolved in nature also, in cells and among animals. While many cases involving cooperation between humans and AIs will be asymmetric, with the human ultimately in control, AI systems are growing so complex that, even today, it is impossible for the human to fully comprehend their reasoning, recommendations, and actions when functioning simply as passive observers.
CYDec 10, 2020
Artificial Intelligence at the EdgeElisa Bertino, Sujata Banerjee
The Internet of Things (IoT) and edge computing applications aim to support a variety of societal needs, including the global pandemic situation that the entire world is currently experiencing and responses to natural disasters. The need for real-time interactive applications such as immersive video conferencing, augmented/virtual reality, and autonomous vehicles, in education, healthcare, disaster recovery and other domains, has never been higher. At the same time, there have been recent technological breakthroughs in highly relevant fields such as artificial intelligence (AI)/machine learning (ML), advanced communication systems (5G and beyond), privacy-preserving computations, and hardware accelerators. 5G mobile communication networks increase communication capacity, reduce transmission latency and error, and save energy -- capabilities that are essential for new applications. The envisioned future 6G technology will integrate many more technologies, including for example visible light communication, to support groundbreaking applications, such as holographic communications and high precision manufacturing. Many of these applications require computations and analytics close to application end-points: that is, at the edge of the network, rather than in a centralized cloud. AI techniques applied at the edge have tremendous potential both to power new applications and to need more efficient operation of edge infrastructure. However, it is critical to understand where to deploy AI systems within complex ecosystems consisting of advanced applications and the specific real-time requirements towards AI systems.
CRDec 7, 2020
Black-box Model Inversion Attribute Inference Attacks on Classification ModelsShagufta Mehnaz, Ninghui Li, Elisa Bertino
Increasing use of ML technologies in privacy-sensitive domains such as medical diagnoses, lifestyle predictions, and business decisions highlights the need to better understand if these ML technologies are introducing leakages of sensitive and proprietary training data. In this paper, we focus on one kind of model inversion attacks, where the adversary knows non-sensitive attributes about instances in the training data and aims to infer the value of a sensitive attribute unknown to the adversary, using oracle access to the target classification model. We devise two novel model inversion attribute inference attacks -- confidence modeling-based attack and confidence score-based attack, and also extend our attack to the case where some of the other (non-sensitive) attributes are unknown to the adversary. Furthermore, while previous work uses accuracy as the metric to evaluate the effectiveness of attribute inference attacks, we find that accuracy is not informative when the sensitive attribute distribution is unbalanced. We identify two metrics that are better for evaluating attribute inference attacks, namely G-mean and Matthews correlation coefficient (MCC). We evaluate our attacks on two types of machine learning models, decision tree and deep neural network, trained with two real datasets. Experimental results show that our newly proposed attacks significantly outperform the state-of-the-art attacks. Moreover, we empirically show that specific groups in the training dataset (grouped by attributes, e.g., gender, race) could be more vulnerable to model inversion attacks. We also demonstrate that our attacks' performances are not impacted significantly when some of the other (non-sensitive) attributes are also unknown to the adversary.
CROct 19, 2020
FLAP -- A Federated Learning Framework for Attribute-based Access Control PoliciesAmani Abu Jabal, Elisa Bertino, Jorge Lobo et al.
Technology advances in areas such as sensors, IoT, and robotics, enable new collaborative applications (e.g., autonomous devices). A primary requirement for such collaborations is to have a secure system which enables information sharing and information flow protection. Policy-based management system is a key mechanism for secure selective sharing of protected resources. However, policies in each party of such a collaborative environment cannot be static as they have to adapt to different contexts and situations. One advantage of collaborative applications is that each party in the collaboration can take advantage of knowledge of the other parties for learning or enhancing its own policies. We refer to this learning mechanism as policy transfer. The design of a policy transfer framework has challenges, including policy conflicts and privacy issues. Policy conflicts typically arise because of differences in the obligations of the parties, whereas privacy issues result because of data sharing constraints for sensitive data. Hence, the policy transfer framework should be able to tackle such challenges by considering minimal sharing of data and support policy adaptation to address conflict. In the paper we propose a framework that aims at addressing such challenges. We introduce a formal definition of the policy transfer problem for attribute-based policies. We then introduce the transfer methodology that consists of three sequential steps. Finally we report experimental results.
CYMar 30, 2020
5G Security and Privacy: A Research RoadmapElisa Bertino, Syed Rafiul Hussain, Omar Chowdhury
Cellular networks represent a critical infrastructure and their security is thus crucial. 5G - the latest generation of cellular networks - combines different technologies to increase capacity, reduce latency, and save energy. Due to its complexity and scale, however, ensuring its security is extremely challenging. In this white paper, we outline recent approaches supporting systematic analyses of 4G LTE and 5G protocols and their related defenses and introduce an initial security and privacy roadmap, covering different research challenges, including formal and comprehensive analyses of cellular protocols as defined by the standardization groups, verification of the software implementing the protocols, the design of robust defenses, and application and device security.
CRJul 7, 2018
Gargoyle: A Network-based Insider Attack Resilient Framework for OrganizationsArash Shaghaghi, Salil S. Kanhere, Mohamed Ali Kaafar et al.
`Anytime, Anywhere' data access model has become a widespread IT policy in organizations making insider attacks even more complicated to model, predict and deter. Here, we propose Gargoyle, a network-based insider attack resilient framework against the most complex insider threats within a pervasive computing context. Compared to existing solutions, Gargoyle evaluates the trustworthiness of an access request context through a new set of contextual attributes called Network Context Attribute (NCA). NCAs are extracted from the network traffic and include information such as the user's device capabilities, security-level, current and prior interactions with other devices, network connection status, and suspicious online activities. Retrieving such information from the user's device and its integrated sensors are challenging in terms of device performance overheads, sensor costs, availability, reliability and trustworthiness. To address these issues, Gargoyle leverages the capabilities of Software-Defined Network (SDN) for both policy enforcement and implementation. In fact, Gargoyle's SDN App can interact with the network controller to create a `defence-in-depth' protection system. For instance, Gargoyle can automatically quarantine a suspicious data requestor in the enterprise network for further investigation or filter out an access request before engaging a data provider. Finally, instead of employing simplistic binary rules in access authorizations, Gargoyle incorporates Function-based Access Control (FBAC) and supports the customization of access policies into a set of functions (e.g., disabling copy, allowing print) depending on the perceived trustworthiness of the context.
CRApr 22, 2015
Differentially Private $k$-Means ClusteringDong Su, Jianneng Cao, Ninghui Li et al.
There are two broad approaches for differentially private data analysis. The interactive approach aims at developing customized differentially private algorithms for various data mining tasks. The non-interactive approach aims at developing differentially private algorithms that can output a synopsis of the input dataset, which can then be used to support various data mining tasks. In this paper we study the tradeoff of interactive vs. non-interactive approaches and propose a hybrid approach that combines interactive and non-interactive, using $k$-means clustering as an example. In the hybrid approach to differentially private $k$-means clustering, one first uses a non-interactive mechanism to publish a synopsis of the input dataset, then applies the standard $k$-means clustering algorithm to learn $k$ cluster centroids, and finally uses an interactive approach to further improve these cluster centroids. We analyze the error behavior of both non-interactive and interactive approaches and use such analysis to decide how to allocate privacy budget between the non-interactive step and the interactive step. Results from extensive experiments support our analysis and demonstrate the effectiveness of our approach.
CRDec 14, 2014
Privacy-Preserving and Outsourced Multi-User k-Means ClusteringBharath K. Samanthula, Fang-Yu Rao, Elisa Bertino et al.
Many techniques for privacy-preserving data mining (PPDM) have been investigated over the past decade. Often, the entities involved in the data mining process are end-users or organizations with limited computing and storage resources. As a result, such entities may want to refrain from participating in the PPDM process. To overcome this issue and to take many other benefits of cloud computing, outsourcing PPDM tasks to the cloud environment has recently gained special attention. We consider the scenario where n entities outsource their databases (in encrypted format) to the cloud and ask the cloud to perform the clustering task on their combined data in a privacy-preserving manner. We term such a process as privacy-preserving and outsourced distributed clustering (PPODC). In this paper, we propose a novel and efficient solution to the PPODC problem based on k-means clustering algorithm. The main novelty of our solution lies in avoiding the secure division operations required in computing cluster centers altogether through an efficient transformation technique. Our solution builds the clusters securely in an iterative fashion and returns the final cluster centers to all entities when a pre-determined termination condition holds. The proposed solution protects data confidentiality of all the participating entities under the standard semi-honest model. To the best of our knowledge, ours is the first work to discuss and propose a comprehensive solution to the PPODC problem that incurs negligible cost on the participating entities. We theoretically estimate both the computation and communication costs of the proposed protocol and also demonstrate its practical value through experiments on a real dataset.
CRJan 15, 2014
Lightweight and Secure Two-Party Range Queries over Outsourced Encrypted DatabasesBharath K. Samanthula, Wei Jiang, Elisa Bertino
With the many benefits of cloud computing, an entity may want to outsource its data and their related analytics tasks to a cloud. When data are sensitive, it is in the interest of the entity to outsource encrypted data to the cloud; however, this limits the types of operations that can be performed on the cloud side. Especially, evaluating queries over the encrypted data stored on the cloud without the entity performing any computation and without ever decrypting the data become a very challenging problem. In this paper, we propose solutions to conduct range queries over outsourced encrypted data. The existing methods leak valuable information to the cloud which can violate the security guarantee of the underlying encryption schemes. In general, the main security primitive used to evaluate range queries is secure comparison (SC) of encrypted integers. However, we observe that the existing SC protocols are not very efficient. To this end, we first propose a novel SC scheme that takes encrypted integers and outputs encrypted comparison result. We empirically show its practical advantage over the current state-of-the-art. We then utilize the proposed SC scheme to construct two new secure range query protocols. Our protocols protect data confidentiality, privacy of user's query, and also preserve the semantic security of the encrypted data; therefore, they are more secure than the existing protocols. Furthermore, our second protocol is lightweight at the user end, and it can allow an authorized user to use any device with limited storage and computing capability to perform the range queries over outsourced encrypted data.