Lewis D. Griffin

CL
16papers
2,075citations
Novelty48%
AI Score32

16 Papers

CLAug 24, 2023
Use of LLMs for Illicit Purposes: Threats, Prevention Measures, and Vulnerabilities

Maximilian Mozes, Xuanli He, Bennett Kleinberg et al.

Spurred by the recent rapid increase in the development and distribution of large language models (LLMs) across industry and academia, much recent work has drawn attention to safety- and security-related threats and vulnerabilities of LLMs, including in the context of potentially criminal activities. Specifically, it has been shown that LLMs can be misused for fraud, impersonation, and the generation of malware; while other authors have considered the more general problem of AI alignment. It is important that developers and practitioners alike are aware of security-related problems with such models. In this paper, we provide an overview of existing - predominantly scientific - efforts on identifying and mitigating threats and vulnerabilities arising from LLMs. We present a taxonomy describing the relationship between threats caused by the generative capabilities of LLMs, prevention measures intended to address such threats, and vulnerabilities arising from imperfect prevention measures. With our work, we hope to raise awareness of the limitations of LLMs in light of such security concerns, among both experienced developers and novel users of such technologies.

LGSep 14, 2022
Using Forwards-Backwards Models to Approximate MDP Homomorphisms

Augustine N. Mavor-Parker, Matthew J. Sargent, Christian Pehle et al.

Reinforcement learning agents must painstakingly learn through trial and error what sets of state-action pairs are value equivalent -- requiring an often prohibitively large amount of environment experience. MDP homomorphisms have been proposed that reduce the MDP of an environment to an abstract MDP, enabling better sample efficiency. Consequently, impressive improvements have been achieved when a suitable homomorphism can be constructed a priori -- usually by exploiting a practitioner's knowledge of environment symmetries. We propose a novel approach to constructing homomorphisms in discrete action spaces, which uses a learnt model of environment dynamics to infer which state-action pairs lead to the same state -- which can reduce the size of the state-action space by a factor as large as the cardinality of the original action space. In MinAtar, we report an almost 4x improvement over a value-based off-policy baseline in the low sample limit, when averaging over all games and optimizers.

CLOct 20, 2022
Identifying Human Strategies for Generating Word-Level Adversarial Examples

Maximilian Mozes, Bennett Kleinberg, Lewis D. Griffin

Adversarial examples in NLP are receiving increasing research attention. One line of investigation is the generation of word-level adversarial examples against fine-tuned Transformer models that preserve naturalness and grammaticality. Previous work found that human- and machine-generated adversarial examples are comparable in their naturalness and grammatical correctness. Most notably, humans were able to generate adversarial examples much more effortlessly than automated attacks. In this paper, we provide a detailed analysis of exactly how humans create these adversarial examples. By exploring the behavioural patterns of human workers during the generation process, we identify statistically significant tendencies based on which words humans prefer to select for adversarial replacement (e.g., word frequencies, word saliencies, sentiment) as well as where and when words are replaced in an input sequence. With our findings, we seek to inspire efforts that harness human strategies for more robust NLP models.

CLAug 13, 2024
Evaluating Cultural Adaptability of a Large Language Model via Simulation of Synthetic Personas

Louis Kwok, Michal Bravansky, Lewis D. Griffin

The success of Large Language Models (LLMs) in multicultural environments hinges on their ability to understand users' diverse cultural backgrounds. We measure this capability by having an LLM simulate human profiles representing various nationalities within the scope of a questionnaire-style psychological experiment. Specifically, we employ GPT-3.5 to reproduce reactions to persuasive news articles of 7,286 participants from 15 countries; comparing the results with a dataset of real participants sharing the same demographic traits. Our analysis shows that specifying a person's country of residence improves GPT-3.5's alignment with their responses. In contrast, using native language prompting introduces shifts that significantly reduce overall alignment, with some languages particularly impairing performance. These findings suggest that while direct nationality information enhances the model's cultural adaptability, native language cues do not reliably improve simulation fidelity and can detract from the model's effectiveness.

CLApr 12, 2022
Self-Supervised Losses for One-Class Textual Anomaly Detection

Kimberly T. Mai, Toby Davies, Lewis D. Griffin

Current deep learning methods for anomaly detection in text rely on supervisory signals in inliers that may be unobtainable or bespoke architectures that are difficult to tune. We study a simpler alternative: fine-tuning Transformers on the inlier data with self-supervised objectives and using the losses as an anomaly score. Overall, the self-supervision approach outperforms other methods under various anomaly detection scenarios, improving the AUROC score on semantic anomalies by 11.6% and on syntactic anomalies by 22.8% on average. Additionally, the optimal objective and resultant learnt representation depend on the type of downstream anomaly. The separability of anomalies and inliers signals that a representation is more effective for detecting semantic anomalies, whilst the presence of narrow feature directions signals a representation that is effective for detecting syntactic anomalies.

LGSep 15, 2023
Understanding the limitations of self-supervised learning for tabular anomaly detection

Kimberly T. Mai, Toby Davies, Lewis D. Griffin

While self-supervised learning has improved anomaly detection in computer vision and natural language processing, it is unclear whether tabular data can benefit from it. This paper explores the limitations of self-supervision for tabular anomaly detection. We conduct several experiments spanning various pretext tasks on 26 benchmark datasets to understand why this is the case. Our results confirm representations derived from self-supervision do not improve tabular anomaly detection performance compared to using the raw representations of the data. We show this is due to neural networks introducing irrelevant features, which reduces the effectiveness of anomaly detectors. However, we demonstrate that using a subspace of the neural network's representation can recover performance.

LGFeb 8, 2021Code
How to Stay Curious while Avoiding Noisy TVs using Aleatoric Uncertainty Estimation

Augustine N. Mavor-Parker, Kimberly A. Young, Caswell Barry et al.

Exploration in environments with sparse rewards is difficult for artificial agents. Curiosity driven learning -- using feed-forward prediction errors as intrinsic rewards -- has achieved some success in these scenarios, but fails when faced with action-dependent noise sources. We present aleatoric mapping agents (AMAs), a neuroscience inspired solution modeled on the cholinergic system of the mammalian brain. AMAs aim to explicitly ascertain which dynamics of the environment are unpredictable, regardless of whether those dynamics are induced by the actions of the agent. This is achieved by generating separate forward predictions for the mean and variance of future states and reducing intrinsic rewards for those transitions with high aleatoric variance. We show AMAs are able to effectively circumvent action-dependent stochastic traps that immobilise conventional curiosity driven agents. The code for all experiments presented in this paper is open sourced: http://github.com/self-supervisor/Escaping-Stochastic-Traps-With-Aleatoric-Mapping-Agents.

CLSep 9, 2021
Contrasting Human- and Machine-Generated Word-Level Adversarial Examples for Text Classification

Maximilian Mozes, Max Bartolo, Pontus Stenetorp et al.

Research shows that natural language processing models are generally considered to be vulnerable to adversarial attacks; but recent work has drawn attention to the issue of validating these adversarial inputs against certain criteria (e.g., the preservation of semantics and grammaticality). Enforcing constraints to uphold such criteria may render attacks unsuccessful, raising the question of whether valid attacks are actually feasible. In this work, we investigate this through the lens of human language ability. We report on crowdsourcing studies in which we task humans with iteratively modifying words in an input text, while receiving immediate model feedback, with the aim of causing a sentiment classification model to misclassify the example. Our findings suggest that humans are capable of generating a substantial amount of adversarial examples using semantics-preserving word substitutions. We analyze how human-generated adversarial examples compare to the recently proposed TextFooler, Genetic, BAE and SememePSO attack algorithms on the dimensions naturalness, preservation of sentiment, grammaticality and substitution rate. Our findings suggest that human-generated adversarial examples are not more able than the best algorithms to generate natural-reading, sentiment-preserving examples, though they do so by being much more computationally efficient.

LGApr 21, 2021
Brittle Features May Help Anomaly Detection

Kimberly T. Mai, Toby Davies, Lewis D. Griffin

One-class anomaly detection is challenging. A representation that clearly distinguishes anomalies from normal data is ideal, but arriving at this representation is difficult since only normal data is available at training time. We examine the performance of representations, transferred from auxiliary tasks, for anomaly detection. Our results suggest that the choice of representation is more important than the anomaly detector used with these representations, although knowledge distillation can work better than using the representations directly. In addition, separability between anomalies and normal data is important but not the sole factor for a good representation, as anomaly detection performance is also correlated with more adversarially brittle features in the representation space. Finally, we show our configuration can detect 96.4% of anomalies in a genuine X-ray security dataset, outperforming previous results.

CLApr 13, 2020
Frequency-Guided Word Substitutions for Detecting Textual Adversarial Examples

Maximilian Mozes, Pontus Stenetorp, Bennett Kleinberg et al.

Recent efforts have shown that neural text processing models are vulnerable to adversarial examples, but the nature of these examples is poorly understood. In this work, we show that adversarial attacks against CNN, LSTM and Transformer-based classification models perform word substitutions that are identifiable through frequency differences between replaced words and their corresponding substitutions. Based on these findings, we propose frequency-guided word substitutions (FGWS), a simple algorithm exploiting the frequency properties of adversarial word substitutions for the detection of adversarial examples. FGWS achieves strong performance by accurately detecting adversarial examples on the SST-2 and IMDb sentiment datasets, with F1 detection scores of up to 91.4% against RoBERTa-based classification models. We compare our approach against a recently proposed perturbation discrimination framework and show that we outperform it by up to 13.0% F1.

IVFeb 18, 2020
Conditional Adversarial Camera Model Anonymization

Jerone T. A. Andrews, Yidan Zhang, Lewis D. Griffin

The model of camera that was used to capture a particular photographic image (model attribution) is typically inferred from high-frequency model-specific artifacts present within the image. Model anonymization is the process of transforming these artifacts such that the apparent capture model is changed. We propose a conditional adversarial approach for learning such transformations. In contrast to previous works, we cast model anonymization as the process of transforming both high and low spatial frequency information. We augment the objective with the loss from a pre-trained dual-stream model attribution classifier, which constrains the generative network to transform the full range of artifacts. Quantitative comparisons demonstrate the efficacy of our framework in a restrictive non-interactive black-box setting.

CVJun 20, 2019
Multiple-Identity Image Attacks Against Face-based Identity Verification

Jerone T. A. Andrews, Thomas Tanay, Lewis D. Griffin

Facial verification systems are vulnerable to poisoning attacks that make use of multiple-identity images (MIIs)---face images stored in a database that resemble multiple persons, such that novel images of any of the constituent persons are verified as matching the identity of the MII. Research on this mode of attack has focused on defence by detection, with no explanation as to why the vulnerability exists. New quantitative results are presented that support an explanation in terms of the geometry of the representations spaces used by the verification systems. In the spherical geometry of those spaces, the angular distance distributions of matching and non-matching pairs of face representations are only modestly separated, approximately centred at 90 and 40-60 degrees, respectively. This is sufficient for open-set verification on normal data but provides an opportunity for MII attacks. Our analysis considers ideal MII algorithms, demonstrating that, if realisable, they would deliver faces roughly 45 degrees from their constituent faces, thus classed as matching them. We study the performance of three methods for MII generation---gallery search, image space morphing, and representation space inversion---and show that the latter two realise the ideal well enough to produce effective attacks, while the former could succeed but only with an implausibly large gallery to search. Gallery search and inversion MIIs depend on having access to a facial comparator, for optimisation, but our results show that these attacks can still be effective when attacking disparate comparators, thus securing a deployed comparator is an insufficient defence.

CVJun 19, 2018
Built-in Vulnerabilities to Imperceptible Adversarial Perturbations

Thomas Tanay, Jerone T. A. Andrews, Lewis D. Griffin

Designing models that are robust to small adversarial perturbations of their inputs has proven remarkably difficult. In this work we show that the reverse problem---making models more vulnerable---is surprisingly easy. After presenting some proofs of concept on MNIST, we introduce a generic tilting attack that injects vulnerabilities into the linear layers of pre-trained networks by increasing their sensitivity to components of low variance in the training data without affecting their performance on test data. We illustrate this attack on a multilayer perceptron trained on SVHN and use it to design a stand-alone adversarial module which we call a steganogram decoder. Finally, we show on CIFAR-10 that a poisoning attack with a poisoning rate as low as 0.1% can induce vulnerabilities to chosen imperceptible backdoor signals in state-of-the-art networks. Beyond their practical implications, these different results shed new light on the nature of the adversarial example phenomenon.

CVSep 9, 2016
Automated detection of smuggled high-risk security threats using Deep Learning

Nicolas Jaccard, Thomas W. Rogers, Edward J. Morton et al.

The security infrastructure is ill-equipped to detect and deter the smuggling of non-explosive devices that enable terror attacks such as those recently perpetrated in western Europe. The detection of so-called "small metallic threats" (SMTs) in cargo containers currently relies on statistical risk analysis, intelligence reports, and visual inspection of X-ray images by security officers. The latter is very slow and unreliable due to the difficulty of the task: objects potentially spanning less than 50 pixels have to be detected in images containing more than 2 million pixels against very complex and cluttered backgrounds. In this contribution, we demonstrate for the first time the use of Convolutional Neural Networks (CNNs), a type of Deep Learning, to automate the detection of SMTs in fullsize X-ray images of cargo containers. Novel approaches for dataset augmentation allowed to train CNNs from-scratch despite the scarcity of data available. We report fewer than 6% false alarms when detecting 90% SMTs synthetically concealed in stream-of-commerce images, which corresponds to an improvement of over an order of magnitude over conventional approaches such as Bag-of-Words (BoWs). The proposed scheme offers potentially super-human performance for a fraction of the time it would take for a security officers to carry out visual inspection (processing time is approximately 3.5s per container image).

CVAug 2, 2016
Automated X-ray Image Analysis for Cargo Security: Critical Review and Future Promise

Thomas W. Rogers, Nicolas Jaccard, Edward J. Morton et al.

We review the relatively immature field of automated image analysis for X-ray cargo imagery. There is increasing demand for automated analysis methods that can assist in the inspection and selection of containers, due to the ever-growing volumes of traded cargo and the increasing concerns that customs- and security-related threats are being smuggled across borders by organised crime and terrorist networks. We split the field into the classical pipeline of image preprocessing and image understanding. Preprocessing includes: image manipulation; quality improvement; Threat Image Projection (TIP); and material discrimination and segmentation. Image understanding includes: Automated Threat Detection (ATD); and Automated Contents Verification (ACV). We identify several gaps in the literature that need to be addressed and propose ideas for future research. Where the current literature is sparse we borrow from the single-view, multi-view, and CT X-ray baggage domains, which have some characteristics in common with X-ray cargo.

CVJun 26, 2016
Detection of concealed cars in complex cargo X-ray imagery using Deep Learning

Nicolas Jaccard, Thomas W. Rogers, Edward J. Morton et al.

Non-intrusive inspection systems based on X-ray radiography techniques are routinely used at transport hubs to ensure the conformity of cargo content with the supplied shipping manifest. As trade volumes increase and regulations become more stringent, manual inspection by trained operators is less and less viable due to low throughput. Machine vision techniques can assist operators in their task by automating parts of the inspection workflow. Since cars are routinely involved in trafficking, export fraud, and tax evasion schemes, they represent an attractive target for automated detection and flagging for subsequent inspection by operators. In this contribution, we describe a method for the detection of cars in X-ray cargo images based on trained-from-scratch Convolutional Neural Networks. By introducing an oversampling scheme that suitably addresses the low number of car images available for training, we achieved 100% car image classification rate for a false positive rate of 1-in-454. Cars that were partially or completely obscured by other goods, a modus operandi frequently adopted by criminals, were correctly detected. We believe that this level of performance suggests that the method is suitable for deployment in the field. It is expected that the generic object detection workflow described can be extended to other object classes given the availability of suitable training data.