CRMay 29
Investigating Detection and Obfuscation of Prompt Injection Attacks Against Software Reverse Engineering AI AgentsBrian Crawford, Patrick McClure
Agentic software reverse engineering systems are vulnerable to prompt injection attacks placed into the source code of executable binary files. This research demonstrates defensive tactics for detecting the presences of prompt injection strings in the decompiler output of adversarial example programs. Methods for obfuscating these attacks and subsequent methods for defending against these obfuscations are also explored. This research advances the understanding of risk and security of agentic software analysis systems necessary for their deployment into production-level cyber workflows.
CRMay 28
Automatically Attacking Software Reverse Engineering AI AgentsBrian Crawford, Justin Phillips, Patrick McClure
Software tools for reverse engineering executable binary files, such as Ghidra, enable malware analysts to safely conduct robust static analysis without having access to original source code. Coupled with the analytic power of large language models (LLM), agentic systems enabled with tools, such as GhidraMCP, can allow analysts to automate a previously human driven process. Although this automation can increase the productivity of a single malware analyst, it also introduces a new area of vulnerability for malware obfuscation. This paper presents an adversarial technique using genetic algorithm-based prompt generation, a modification of an adversarial attack known as AutoDAN, to demonstrate the ability to deceive LLM-powered disassembly and decompilation systems into misinterpreting binary executables, effectively corrupting their analytical output. This proof-of-concept methodology exploits inherent vulnerabilities in how LLMs process and interpret decompiled machine code via prompt injection by using extraneous string variable assignments to pass surreptitious instructions to the LLM while not impacting the functionality of the executable file. We demonstrate this capability through several concise examples. This approach could enable attackers to bypass automated detection systems that rely on LLM-driven analysis pipelines. By studying and understanding this attack, insights can be gained regarding the security implication of integrating LLMs into cybersecurity toolchains and building more robust agentic code analysis systems.
LGMay 2, 2022
VICE: Variational Interpretable Concept EmbeddingsLukas Muttenthaler, Charles Y. Zheng, Patrick McClure et al.
A central goal in the cognitive sciences is the development of numerical models for mental representations of object concepts. This paper introduces Variational Interpretable Concept Embeddings (VICE), an approximate Bayesian method for embedding object concepts in a vector space using data collected from humans in a triplet odd-one-out task. VICE uses variational inference to obtain sparse, non-negative representations of object concepts with uncertainty estimates for the embedding values. These estimates are used to automatically select the dimensions that best explain the data. We derive a PAC learning bound for VICE that can be used to estimate generalization performance or determine a sufficient sample size for experimental design. VICE rivals or outperforms its predecessor, SPoSE, at predicting human behavior in the triplet odd-one-out task. Furthermore, VICE's object representations are more reproducible and consistent across random initializations, highlighting the unique advantage of using VICE for deriving interpretable embeddings from human behavior.
LGFeb 6, 2023
Concrete Safety for ML Problems: System Safety for ML Development and AssessmentEdgar W. Jatho, Logan O. Mailloux, Eugene D. Williams et al.
Many stakeholders struggle to make reliances on ML-driven systems due to the risk of harm these systems may cause. Concerns of trustworthiness, unintended social harms, and unacceptable social and ethical violations undermine the promise of ML advancements. Moreover, such risks in complex ML-driven systems present a special challenge as they are often difficult to foresee, arising over periods of time, across populations, and at scale. These risks often arise not from poor ML development decisions or low performance directly but rather emerge through the interactions amongst ML development choices, the context of model use, environmental factors, and the effects of a model on its target. Systems safety engineering is an established discipline with a proven track record of identifying and managing risks even in high-complexity sociotechnical systems. In this work, we apply a state-of-the-art systems safety approach to concrete applications of ML with notable social and ethical risks to demonstrate a systematic means for meeting the assurance requirements needed to argue for safe and trustworthy ML in sociotechnical systems.
LGMar 22, 2024
Federated Bayesian Deep Learning: The Application of Statistical Aggregation Methods to Bayesian ModelsJohn Fischer, Marko Orescanin, Justin Loomis et al.
Federated learning (FL) is an approach to training machine learning models that takes advantage of multiple distributed datasets while maintaining data privacy and reducing communication costs associated with sharing local datasets. Aggregation strategies have been developed to pool or fuse the weights and biases of distributed deterministic models; however, modern deterministic deep learning (DL) models are often poorly calibrated and lack the ability to communicate a measure of epistemic uncertainty in prediction, which is desirable for remote sensing platforms and safety-critical applications. Conversely, Bayesian DL models are often well calibrated and capable of quantifying and communicating a measure of epistemic uncertainty along with a competitive prediction accuracy. Unfortunately, because the weights and biases in Bayesian DL models are defined by a probability distribution, simple application of the aggregation methods associated with FL schemes for deterministic models is either impossible or results in sub-optimal performance. In this work, we use independent and identically distributed (IID) and non-IID partitions of the CIFAR-10 dataset and a fully variational ResNet-20 architecture to analyze six different aggregation strategies for Bayesian DL models. Additionally, we analyze the traditional federated averaging approach applied to an approximate Bayesian Monte Carlo dropout model as a lightweight alternative to more complex variational inference methods in FL. We show that aggregation strategy is a key hyperparameter in the design of a Bayesian FL system with downstream effects on accuracy, calibration, uncertainty quantification, training stability, and client compute requirements.
CVSep 8, 2020
A Deep Neural Network Tool for Automatic Segmentation of Human Body Parts in Natural ScenesPatrick McClure, Gabrielle Reimann, Michal Ramot et al.
This short article describes a deep neural network trained to perform automatic segmentation of human body parts in natural scenes. More specifically, we trained a Bayesian SegNet with concrete dropout on the Pascal-Parts dataset to predict whether each pixel in a given frame was part of a person's hair, head, ear, eyebrows, legs, arms, mouth, neck, nose, or torso.
LGApr 23, 2020
Improving the Interpretability of fMRI Decoding using Deep Neural Networks and Adversarial RobustnessPatrick McClure, Dustin Moraczewski, Ka Chun Lam et al.
Deep neural networks (DNNs) are being increasingly used to make predictions from functional magnetic resonance imaging (fMRI) data. However, they are widely seen as uninterpretable "black boxes", as it can be difficult to discover what input information is used by the DNN in the process, something important in both cognitive neuroscience and clinical applications. A saliency map is a common approach for producing interpretable visualizations of the relative importance of input features for a prediction. However, methods for creating maps often fail due to DNNs being sensitive to input noise, or by focusing too much on the input and too little on the model. It is also challenging to evaluate how well saliency maps correspond to the truly relevant input information, as ground truth is not always available. In this paper, we review a variety of methods for producing gradient-based saliency maps, and present a new adversarial training method we developed to make DNNs robust to input noise, with the goal of improving interpretability. We introduce two quantitative evaluation procedures for saliency map methods in fMRI, applicable whenever a DNN or linear model is being trained to decode some information from imaging data. We evaluate the procedures using a synthetic dataset where the complex activation structure is known, and on saliency maps produced for DNN and linear models for task decoding in the Human Connectome Project (HCP) dataset. Our key finding is that saliency maps produced with different methods vary widely in interpretability, in both in synthetic and HCP fMRI data. Strikingly, even when DNN and linear models decode at comparable levels of performance, DNN saliency maps score higher on interpretability than linear model saliency maps (derived via weights or gradient). Finally, saliency maps produced with our adversarial training method outperform those from other methods.
CVDec 3, 2018
Knowing what you know in brain segmentation using Bayesian deep neural networksPatrick McClure, Nao Rho, John A. Lee et al.
In this paper, we describe a Bayesian deep neural network (DNN) for predicting FreeSurfer segmentations of structural MRI volumes, in minutes rather than hours. The network was trained and evaluated on a large dataset (n = 11,480), obtained by combining data from more than a hundred different sites, and also evaluated on another completely held-out dataset (n = 418). The network was trained using a novel spike-and-slab dropout-based variational inference approach. We show that, on these datasets, the proposed Bayesian DNN outperforms previously proposed methods, in terms of the similarity between the segmentation predictions and the FreeSurfer labels, and the usefulness of the estimate uncertainty of these predictions. In particular, we demonstrated that the prediction uncertainty of this network at each voxel is a good indicator of whether the network has made an error and that the uncertainty across the whole brain can predict the manual quality control ratings of a scan. The proposed Bayesian DNN method should be applicable to any new network architecture for addressing the segmentation problem.
LGMay 28, 2018
Distributed Weight Consolidation: A Brain Segmentation Case StudyPatrick McClure, Charles Y. Zheng, Jakub R. Kaczmarzyk et al.
Collecting the large datasets needed to train deep neural networks can be very difficult, particularly for the many applications for which sharing and pooling data is complicated by practical, ethical, or legal concerns. However, it may be the case that derivative datasets or predictive models developed within individual sites can be shared and combined with fewer restrictions. Training on distributed data and combining the resulting networks is often viewed as continual learning, but these methods require networks to be trained sequentially. In this paper, we introduce distributed weight consolidation (DWC), a continual learning method to consolidate the weights of separate neural networks, each trained on an independent dataset. We evaluated DWC with a brain segmentation case study, where we consolidated dilated convolutional neural networks trained on independent structural magnetic resonance imaging (sMRI) datasets from different sites. We found that DWC led to increased performance on test sets from the different sites, while maintaining generalization performance for a very large and completely independent multi-site dataset, compared to an ensemble baseline.
LGNov 5, 2016
Robustly representing uncertainty in deep neural networks through samplingPatrick McClure, Nikolaus Kriegeskorte
As deep neural networks (DNNs) are applied to increasingly challenging problems, they will need to be able to represent their own uncertainty. Modeling uncertainty is one of the key features of Bayesian methods. Using Bernoulli dropout with sampling at prediction time has recently been proposed as an efficient and well performing variational inference method for DNNs. However, sampling from other multiplicative noise based variational distributions has not been investigated in depth. We evaluated Bayesian DNNs trained with Bernoulli or Gaussian multiplicative masking of either the units (dropout) or the weights (dropconnect). We tested the calibration of the probabilistic predictions of Bayesian convolutional neural networks (CNNs) on MNIST and CIFAR-10. Sampling at prediction time increased the calibration of the DNNs' probabalistic predictions. Sampling weights, whether Gaussian or Bernoulli, led to more robust representation of uncertainty compared to sampling of units. However, using either Gaussian or Bernoulli dropout led to increased test set classification accuracy. Based on these findings we used both Bernoulli dropout and Gaussian dropconnect concurrently, which we show approximates the use of a spike-and-slab variational distribution without increasing the number of learned parameters. We found that spike-and-slab sampling had higher test set performance than Gaussian dropconnect and more robustly represented its uncertainty compared to Bernoulli dropout.
NENov 12, 2015
Representational Distance Learning for Deep Neural NetworksPatrick McClure, Nikolaus Kriegeskorte
Deep neural networks (DNNs) provide useful models of visual representational transformations. We present a method that enables a DNN (student) to learn from the internal representational spaces of a reference model (teacher), which could be another DNN or, in the future, a biological brain. Representational spaces of the student and the teacher are characterized by representational distance matrices (RDMs). We propose representational distance learning (RDL), a stochastic gradient descent method that drives the RDMs of the student to approximate the RDMs of the teacher. We demonstrate that RDL is competitive with other transfer learning techniques for two publicly available benchmark computer vision datasets (MNIST and CIFAR-100), while allowing for architectural differences between student and teacher. By pulling the student's RDMs towards those of the teacher, RDL significantly improved visual classification performance when compared to baseline networks that did not use transfer learning. In the future, RDL may enable combined supervised training of deep neural networks using task constraints (e.g. images and category labels) and constraints from brain-activity measurements, so as to build models that replicate the internal representational spaces of biological brains.