Scott Coull

3papers

58citations

Novelty65%

AI Score28

Ranked #154,401 of 201,326 authors (top 77%)#4,390 in CR (top 60%)

3 Papers

LGDec 5, 2022

Efficient Malware Analysis Using Metric Embeddings

Ethan M. Rudd, David Krisiloff, Scott Coull et al.

In this paper, we explore the use of metric learning to embed Windows PE files in a low-dimensional vector space for downstream use in a variety of applications, including malware detection, family classification, and malware attribute tagging. Specifically, we enrich labeling on malicious and benign PE files using computationally expensive, disassembly-based malicious capabilities. Using these capabilities, we derive several different types of metric embeddings utilizing an embedding neural network trained via contrastive loss, Spearman rank correlation, and combinations thereof. We then examine performance on a variety of transfer tasks performed on the EMBER and SOREL datasets, demonstrating that for several tasks, low-dimensional, computationally efficient metric embeddings maintain performance with little decay, which offers the potential to quickly retrain for a variety of transfer tasks at significantly reduced storage overhead. We conclude with an examination of practical considerations for the use of our proposed embedding approach, such as robustness to adversarial evasion and introduction of task-specific auxiliary objectives to improve performance on mission critical tasks.

CRMar 2, 2020

Explanation-Guided Backdoor Poisoning Attacks Against Malware Classifiers

Giorgio Severi, Jim Meyer, Scott Coull et al.

Training pipelines for machine learning (ML) based malware classification often rely on crowdsourced threat feeds, exposing a natural attack injection point. In this paper, we study the susceptibility of feature-based ML malware classifiers to backdoor poisoning attacks, specifically focusing on challenging "clean label" attacks where attackers do not control the sample labeling process. We propose the use of techniques from explainable machine learning to guide the selection of relevant features and values to create effective backdoor triggers in a model-agnostic fashion. Using multiple reference datasets for malware classification, including Windows PE files, PDFs, and Android applications, we demonstrate effective attacks against a diverse set of machine learning models and evaluate the effect of various constraints imposed on the attacker. To demonstrate the feasibility of our backdoor attacks in practice, we create a watermarking utility for Windows PE files that preserves the binary's functionality, and we leverage similar behavior-preserving alteration methodologies for Android and PDF files. Finally, we experiment with potential defensive strategies and show the difficulties of completely defending against these attacks, especially when the attacks blend in with the legitimate sample distribution.

CRMar 8, 2014

Privacy Failures in Encrypted Messaging Services: Apple iMessage and Beyond

Scott Coull, Kevin Dyer

Instant messaging services are quickly becoming the most dominant form of communication among consumers around the world. Apple iMessage, for example, handles over 2 billion message each day, while WhatsApp claims 16 billion messages from 400 million international users. To protect user privacy, these services typically implement end-to-end and transport layer encryption, which are meant to make eavesdropping infeasible even for the service providers themselves. In this paper, however, we show that it is possible for an eavesdropper to learn information about user actions, the language of messages, and even the length of those messages with greater than 96% accuracy despite the use of state-of-the-art encryption technologies simply by observing the sizes of encrypted packet. While our evaluation focuses on Apple iMessage, the attacks are completely generic and we show how they can be applied to many popular messaging services, including WhatsApp, Viber, and Telegram.