LGSep 19, 2023
Model Leeching: An Extraction Attack Targeting LLMsLewis Birch, William Hackett, Stefan Trawicki et al.
Model Leeching is a novel extraction attack targeting Large Language Models (LLMs), capable of distilling task-specific knowledge from a target LLM into a reduced parameter model. We demonstrate the effectiveness of our attack by extracting task capability from ChatGPT-3.5-Turbo, achieving 73% Exact Match (EM) similarity, and SQuAD EM and F1 accuracy scores of 75% and 87%, respectively for only $50 in API cost. We further demonstrate the feasibility of adversarial attack transferability from an extracted model extracted via Model Leeching to perform ML attack staging against a target LLM, resulting in an 11% increase to attack success rate when applied to ChatGPT-3.5-Turbo.
CRSep 13, 2022
PINCH: An Adversarial Extraction Attack Framework for Deep Learning ModelsWilliam Hackett, Stefan Trawicki, Zhengxin Yu et al.
Adversarial extraction attacks constitute an insidious threat against Deep Learning (DL) models in-which an adversary aims to steal the architecture, parameters, and hyper-parameters of a targeted DL model. Existing extraction attack literature have observed varying levels of attack success for different DL models and datasets, yet the underlying cause(s) behind their susceptibility often remain unclear, and would help facilitate creating secure DL systems. In this paper we present PINCH: an efficient and automated extraction attack framework capable of designing, deploying, and analyzing extraction attack scenarios across heterogeneous hardware platforms. Using PINCH, we perform extensive experimental evaluation of extraction attacks against 21 model architectures to explore new extraction attack scenarios and further attack staging. Our findings show (1) key extraction characteristics whereby particular model configurations exhibit strong resilience against specific attacks, (2) even partial extraction success enables further staging for other adversarial attacks, and (3) equivalent stolen models uncover differences in expressive power, yet exhibit similar captured knowledge.
CROct 1, 2022
Privacy-preserving Decentralized Federated Learning over Time-varying Communication GraphYang Lu, Zhengxin Yu, Neeraj Suri
Establishing how a set of learners can provide privacy-preserving federated learning in a fully decentralized (peer-to-peer, no coordinator) manner is an open problem. We propose the first privacy-preserving consensus-based algorithm for the distributed learners to achieve decentralized global model aggregation in an environment of high mobility, where the communication graph between the learners may vary between successive rounds of model aggregation. In particular, in each round of global model aggregation, the Metropolis-Hastings method is applied to update the weighted adjacency matrix based on the current communication topology. In addition, the Shamir's secret sharing scheme is integrated to facilitate privacy in reaching consensus of the global model. The paper establishes the correctness and privacy properties of the proposed algorithm. The computational efficiency is evaluated by a simulation built on a federated learning framework with a real-word dataset.
LGSep 20, 2023
Compilation as a Defense: Enhancing DL Model Attack Robustness via Tensor OptimizationStefan Trawicki, William Hackett, Lewis Birch et al.
Adversarial Machine Learning (AML) is a rapidly growing field of security research, with an often overlooked area being model attacks through side-channels. Previous works show such attacks to be serious threats, though little progress has been made on efficient remediation strategies that avoid costly model re-engineering. This work demonstrates a new defense against AML side-channel attacks using model compilation techniques, namely tensor optimization. We show relative model attack effectiveness decreases of up to 43% using tensor optimization, discuss the implications, and direction of future work.
CVJul 5, 2024
Self-Supervised Representation Learning for Adversarial Attack DetectionYi Li, Plamen Angelov, Neeraj Suri
Supervised learning-based adversarial attack detection methods rely on a large number of labeled data and suffer significant performance degradation when applying the trained model to new domains. In this paper, we propose a self-supervised representation learning framework for the adversarial attack detection task to address this drawback. Firstly, we map the pixels of augmented input images into an embedding space. Then, we employ the prototype-wise contrastive estimation loss to cluster prototypes as latent variables. Additionally, drawing inspiration from the concept of memory banks, we introduce a discrimination bank to distinguish and learn representations for each individual instance that shares the same or a similar prototype, establishing a connection between instances and their associated prototypes. We propose a parallel axial-attention (PAA)-based encoder to facilitate the training process by parallel training over height- and width-axis of attention maps. Experimental results show that, compared to various benchmark self-supervised vision learning models and supervised adversarial attack detection methods, the proposed model achieves state-of-the-art performance on the adversarial attack detection task across a wide range of images.
CRApr 15, 2025
Bypassing LLM Guardrails: An Empirical Analysis of Evasion Attacks against Prompt Injection and Jailbreak Detection SystemsWilliam Hackett, Lewis Birch, Stefan Trawicki et al.
Large Language Models (LLMs) guardrail systems are designed to protect against prompt injection and jailbreak attacks. However, they remain vulnerable to evasion techniques. We demonstrate two approaches for bypassing LLM prompt injection and jailbreak detection systems via traditional character injection methods and algorithmic Adversarial Machine Learning (AML) evasion techniques. Through testing against six prominent protection systems, including Microsoft's Azure Prompt Shield and Meta's Prompt Guard, we show that both methods can be used to evade detection while maintaining adversarial utility achieving in some instances up to 100% evasion success. Furthermore, we demonstrate that adversaries can enhance Attack Success Rates (ASR) against black-box targets by leveraging word importance ranking computed by offline white-box models. Our findings reveal vulnerabilities within current LLM protection mechanisms and highlight the need for more robust guardrail systems.
CVJun 24, 2024
UNICAD: A Unified Approach for Attack Detection, Noise Reduction and Novel Class IdentificationAlvaro Lopez Pellicer, Kittipos Giatgong, Yi Li et al.
As the use of Deep Neural Networks (DNNs) becomes pervasive, their vulnerability to adversarial attacks and limitations in handling unseen classes poses significant challenges. The state-of-the-art offers discrete solutions aimed to tackle individual issues covering specific adversarial attack scenarios, classification or evolving learning. However, real-world systems need to be able to detect and recover from a wide range of adversarial attacks without sacrificing classification accuracy and to flexibly act in {\bf unseen} scenarios. In this paper, UNICAD, is proposed as a novel framework that integrates a variety of techniques to provide an adaptive solution. For the targeted image classification, UNICAD achieves accurate image classification, detects unseen classes, and recovers from adversarial attacks using Prototype and Similarity-based DNNs with denoising autoencoders. Our experiments performed on the CIFAR-10 dataset highlight UNICAD's effectiveness in adversarial mitigation and unseen class classification, outperforming traditional models.
CVJun 22, 2024
Federated Adversarial Learning for Robust Autonomous Landing Runway DetectionYi Li, Plamen Angelov, Zhengxin Yu et al.
As the development of deep learning techniques in autonomous landing systems continues to grow, one of the major challenges is trust and security in the face of possible adversarial attacks. In this paper, we propose a federated adversarial learning-based framework to detect landing runways using paired data comprising of clean local data and its adversarial version. Firstly, the local model is pre-trained on a large-scale lane detection dataset. Then, instead of exploiting large instance-adaptive models, we resort to a parameter-efficient fine-tuning method known as scale and shift deep features (SSF), upon the pre-trained model. Secondly, in each SSF layer, distributions of clean local data and its adversarial version are disentangled for accurate statistics estimation. To the best of our knowledge, this marks the first instance of federated learning work that address the adversarial sample problem in landing runway detection. Our experimental evaluations over both synthesis and real images of Landing Approach Runway Detection (LARD) dataset consistently demonstrate good performance of the proposed federated adversarial learning and robust to adversarial attacks.
CRSep 9, 2020
A Security Architecture for Railway SignallingChristian Schlehuber, Markus Heinrich, Tsvetoslava Vateva-Gurova et al.
We present the proposed security architecture Deutsche Bahn plans to deploy to protect its trackside safety-critical signalling system against cyber-attacks. We first present the existing reference interlocking system that is built using standard components. Next, we present a taxonomy to help model the attack vectors relevant for the railway environment. Building upon this, we present the proposed "compartmentalized" defence concept for securing the upcoming signalling systems.
CRApr 12, 2018
QRES: Quantitative Reasoning on Encrypted Security SLAsAhmed Taha, Spyros Boukoros, Jesus Luna et al.
While regulators advocate for higher cloud transparency, many Cloud Service Providers (CSPs) often do not provide detailed information regarding their security implementations in their Service Level Agreements (SLAs). In practice, CSPs are hesitant to release detailed information regarding their security posture for security and proprietary reasons. This lack of transparency hinders the adoption of cloud computing by enterprises and individuals. Unless CSPs share information regarding the technical details of their security proceedings and standards, customers cannot verify which cloud provider matched their needs in terms of security and privacy guarantees. To address this problem, we propose QRES, the first system that enables (a) CSPs to disclose detailed information about their offered security services in an encrypted form to ensure data confidentiality, and (b) customers to assess the CSPs' offered security services and find those satisfying their security requirements. Our system preserves each party's privacy by leveraging a novel evaluation method based on Secure Two Party Computation (2PC) and Searchable Encryption techniques. We implement QRES and highlight its usefulness by applying it to existing standardized SLAs. The real world tests illustrate that the system runs in acceptable time for practical application even when used with a multitude of CSPs. We formally prove the security requirements of the proposed system against a strong realistic adversarial model, using an automated cryptographic protocol verifier.