Matthew Edwards

CR
7papers
113citations
Novelty38%
AI Score25

7 Papers

CROct 22, 2023
Text generation for dataset augmentation in security classification tasks

Alexander P. Welsh, Matthew Edwards

Security classifiers, designed to detect malicious content in computer systems and communications, can underperform when provided with insufficient training data. In the security domain, it is often easy to find samples of the negative (benign) class, and challenging to find enough samples of the positive (malicious) class to train an effective classifier. This study evaluates the application of natural language text generators to fill this data gap in multiple security-related text classification tasks. We describe a variety of previously-unexamined language-model fine-tuning approaches for this purpose and consider in particular the impact of disproportionate class-imbalances in the training set. Across our evaluation using three state-of-the-art classifiers designed for offensive language detection, review fraud detection, and SMS spam detection, we find that models trained with GPT-3 data augmentation strategies outperform both models trained without augmentation and models trained using basic data augmentation strategies already in common usage. In particular, we find substantial benefits for GPT-3 data augmentation strategies in situations with severe limitations on known positive-class samples.

CRMar 14, 2024Code
Helpful or Harmful? Exploring the Efficacy of Large Language Models for Online Grooming Prevention

Ellie Prosser, Matthew Edwards

Powerful generative Large Language Models (LLMs) are becoming popular tools amongst the general public as question-answering systems, and are being utilised by vulnerable groups such as children. With children increasingly interacting with these tools, it is imperative for researchers to scrutinise the safety of LLMs, especially for applications that could lead to serious outcomes, such as online child safety queries. In this paper, the efficacy of LLMs for online grooming prevention is explored both for identifying and avoiding grooming through advice generation, and the impact of prompt design on model performance is investigated by varying the provided context and prompt specificity. In results reflecting over 6,000 LLM interactions, we find that no models were clearly appropriate for online grooming prevention, with an observed lack of consistency in behaviours, and potential for harmful answer generation, especially from open-source models. We outline where and how models fall short, providing suggestions for improvement, and identify prompt designs that heavily altered model performance in troubling ways, with findings that can be used to inform best practice usage guides.

CYFeb 15, 2022
Characterising Cybercriminals: A Review

Matthew Edwards, Emma Williams, Claudia Peersman et al.

This review provides an overview of current research on the known characteristics and motivations of offenders engaging in cyber-dependent crimes. Due to the shifting dynamics of cybercriminal behaviour, and the availability of prior reviews in 2013, this review focuses on original research conducted from 2012 onwards, although some older studies that were not included in prior reviews are also considered. As a basis for interpretation of results, a limited quality assessment was also carried out on included studies through examination of key indicators.

CRApr 1, 2021
The best laid plans or lack thereof: Security decision-making of different stakeholder groups

Benjamin Shreeve, Joseph Hallett, Matthew Edwards et al.

Cyber security requirements are influenced by the priorities and decisions of a range of stakeholders. Board members and CISOs determine strategic priorities. Managers have responsibility for resource allocation and project management. Legal professionals concern themselves with regulatory compliance. Little is understood about how the security decision-making approaches of these different stakeholders contrast, and if particular groups of stakeholders have a better appreciation of security requirements during decision-making. Are risk analysts better decision makers than CISOs? Do security experts exhibit more effective strategies than board members? This paper explores the effect that different experience and diversity of expertise has on the quality of a team's cyber security decision-making and whether teams with members from more varied backgrounds perform better than those with more focused, homogeneous skill sets. Using data from 208 sessions and 948 players of a tabletop game run in the wild by a major national organization over 16 months, we explore how choices are affected by player background (e.g.,~cyber security experts versus risk analysts, board-level decision makers versus technical experts) and different team make-ups (homogeneous teams of security experts versus various mixes). We find that no group of experts makes significantly better game decisions than anyone else, and that their biases lead them to not fully comprehend what they are defending or how the defenses work.

IVAug 17, 2019
EigenRank by Committee: A Data Subset Selection and Failure Prediction paradigm for Robust Deep Learning based Medical Image Segmentation

Bilwaj Gaonkar, Joel Beckett, Mark Attiah et al.

Translation of fully automated deep learning based medical image segmentation technologies to clinical workflows face two main algorithmic challenges. The first, is the collection and archival of large quantities of manually annotated ground truth data for both training and validation. The second is the relative inability of the majority of deep learning based segmentation techniques to alert physicians to a likely segmentation failure. Here we propose a novel algorithm, named `Eigenrank' which addresses both of these challenges. Eigenrank can select for manual labeling, a subset of medical images from a large database, such that a U-Net trained on this subset is superior to one trained on a randomly selected subset of the same size. Eigenrank can also be used to pick out, cases in a large database, where deep learning segmentation will fail. We present our algorithm, followed by results and a discussion of how Eigenrank exploits the Von Neumann information to perform both data subset selection and failure prediction for medical image segmentation using deep learning.

CRMay 29, 2019
Automatically Dismantling Online Dating Fraud

Guillermo Suarez-Tangil, Matthew Edwards, Claudia Peersman et al.

Online romance scams are a prevalent form of mass-marketing fraud in the West, and yet few studies have addressed the technical or data-driven responses to this problem. In this type of scam, fraudsters craft fake profiles and manually interact with their victims. Because of the characteristics of this type of fraud and of how dating sites operate, traditional detection methods (e.g., those used in spam filtering) are ineffective. In this paper, we present the results of a multi-pronged investigation into the archetype of online dating profiles used in this form of fraud, including their use of demographics, profile descriptions, and images, shedding light on both the strategies deployed by scammers to appeal to victims and the traits of victims themselves. Further, in response to the severe financial and psychological harm caused by dating fraud, we develop a system to detect romance scammers on online dating platforms. Our work presents the first system for automatically detecting this fraud. Our aim is to provide an early detection system to stop romance scammers as they create fraudulent profiles or before they engage with potential victims. Previous research has indicated that the victims of romance scams score highly on scales for idealized romantic beliefs. We combine a range of structured, unstructured, and deep-learned features that capture these beliefs. No prior work has fully analyzed whether these notions of romance introduce traits that could be leveraged to build a detection system. Our ensemble machine-learning approach is robust to the omission of profile details and performs at high accuracy (97\%). The system enables development of automated tools for dating site providers and individual users.

CVOct 3, 2018
Extreme Augmentation : Can deep learning based medical image segmentation be trained using a single manually delineated scan?

Bilwaj Gaonkar, Matthew Edwards, Alex Bui et al.

Yes, it can. Data augmentation is perhaps the oldest preprocessing step in computer vision literature. Almost every computer vision model trained on imaging data uses some form of augmentation. In this paper, we use the inter-vertebral disk segmentation task alongside a deep residual U-Net as the learning model, to explore the effectiveness of augmentation. In the extreme, we observed that a model trained on patches extracted from just one scan, with each patch augmented 50 times; achieved a Dice score of 0.73 in a validation set of 40 cases. Qualitative evaluation indicated a clinically usable segmentation algorithm, which appropriately segments regions of interest, alongside limited false positive specks. When the initial patches are extracted from nine scans the average Dice coefficient jumps to 0.86 and most of the false positives disappear. While this still falls short of state-of-the-art deep learning based segmentation of discs reported in literature, qualitative examination reveals that it does yield segmentation, which can be amended by expert clinicians with minimal effort to generate additional data for training improved deep models. Extreme augmentation of training data, should thus be construed as a strategy for training deep learning based algorithms, when very little manually annotated data is available to work with. Models trained with extreme augmentation can then be used to accelerate the generation of manually labelled data. Hence, we show that extreme augmentation can be a valuable tool in addressing scaling up small imaging data sets to address medical image segmentation tasks.