LGOct 28, 2022Code
Coverage-centric Coreset Selection for High Pruning RatesHaizhong Zheng, Rui Liu, Fan Lai et al.
One-shot coreset selection aims to select a representative subset of the training data, given a pruning rate, that can later be used to train future models while retaining high accuracy. State-of-the-art coreset selection methods pick the highest importance examples based on an importance metric and are found to perform well at low pruning rates. However, at high pruning rates, they suffer from a catastrophic accuracy drop, performing worse than even random sampling. This paper explores the reasons behind this accuracy drop both theoretically and empirically. We first propose a novel metric to measure the coverage of a dataset on a specific distribution by extending the classical geometric set cover problem to a distribution cover problem. This metric helps explain why coresets selected by SOTA methods at high pruning rates perform poorly compared to random sampling because of worse data coverage. We then propose a novel one-shot coreset selection method, Coverage-centric Coreset Selection (CCS), that jointly considers overall data coverage upon a distribution as well as the importance of each example. We evaluate CCS on five datasets and show that, at high pruning rates (e.g., 90%), it achieves significantly better accuracy than previous SOTA methods (e.g., at least 19.56% higher on CIFAR10) as well as random selection (e.g., 7.04% higher on CIFAR10) and comparable accuracy at low pruning rates. We make our code publicly available at https://github.com/haizhongzheng/Coverage-centric-coreset-selection.
LGMar 4, 2022
Concept-based Explanations for Out-Of-Distribution DetectorsJihye Choi, Jayaram Raghuram, Ryan Feng et al.
Out-of-distribution (OOD) detection plays a crucial role in ensuring the safe deployment of deep neural network (DNN) classifiers. While a myriad of methods have focused on improving the performance of OOD detectors, a critical gap remains in interpreting their decisions. We help bridge this gap by providing explanations for OOD detectors based on learned high-level concepts. We first propose two new metrics for assessing the effectiveness of a particular set of concepts for explaining OOD detectors: 1) detection completeness, which quantifies the sufficiency of concepts for explaining an OOD-detector's decisions, and 2) concept separability, which captures the distributional separation between in-distribution and OOD data in the concept space. Based on these metrics, we propose an unsupervised framework for learning a set of concepts that satisfy the desired properties of high detection completeness and concept separability, and demonstrate its effectiveness in providing concept-based explanations for diverse off-the-shelf OOD detectors. We also show how to identify prominent concepts contributing to the detection results, and provide further reasoning about their decisions.
CRMar 11, 2023
Stateful Defenses for Machine Learning Models Are Not Yet Secure Against Black-box AttacksRyan Feng, Ashish Hooda, Neal Mangaokar et al.
Recent work has proposed stateful defense models (SDMs) as a compelling strategy to defend against a black-box attacker who only has query access to the model, as is common for online machine learning platforms. Such stateful defenses aim to defend against black-box attacks by tracking the query history and detecting and rejecting queries that are "similar" and thus preventing black-box attacks from finding useful gradients and making progress towards finding adversarial attacks within a reasonable query budget. Recent SDMs (e.g., Blacklight and PIHA) have shown remarkable success in defending against state-of-the-art black-box attacks. In this paper, we show that SDMs are highly vulnerable to a new class of adaptive black-box attacks. We propose a novel adaptive black-box attack strategy called Oracle-guided Adaptive Rejection Sampling (OARS) that involves two stages: (1) use initial query patterns to infer key properties about an SDM's defense; and, (2) leverage those extracted properties to design subsequent query patterns to evade the SDM's defense while making progress towards finding adversarial inputs. OARS is broadly applicable as an enhancement to existing black-box attacks - we show how to apply the strategy to enhance six common black-box attacks to be more effective against current class of SDMs. For example, OARS-enhanced versions of black-box attacks improved attack success rate against recent stateful defenses from almost 0% to to almost 100% for multiple datasets within reasonable query budgets.
LGJul 30, 2023
Theoretically Principled Trade-off for Stateful Defenses against Query-Based Black-Box AttacksAshish Hooda, Neal Mangaokar, Ryan Feng et al.
Adversarial examples threaten the integrity of machine learning systems with alarming success rates even under constrained black-box conditions. Stateful defenses have emerged as an effective countermeasure, detecting potential attacks by maintaining a buffer of recent queries and detecting new queries that are too similar. However, these defenses fundamentally pose a trade-off between attack detection and false positive rates, and this trade-off is typically optimized by hand-picking feature extractors and similarity thresholds that empirically work well. There is little current understanding as to the formal limits of this trade-off and the exact properties of the feature extractors/underlying problem domain that influence it. This work aims to address this gap by offering a theoretical characterization of the trade-off between detection and false positive rates for stateful defenses. We provide upper bounds for detection rates of a general class of feature extractors and analyze the impact of this trade-off on the convergence of black-box attacks. We then support our theoretical findings with empirical evaluations across multiple datasets and stateful defenses.
LGMay 18, 2022
Defending Object Detectors against Patch Attacks with Out-of-Distribution SmoothingRyan Feng, Neal Mangaokar, Jihye Choi et al.
Patch attacks against object detectors have been of recent interest due to their being physically realizable and more closely aligned with practical systems. In response to this threat, many new defenses have been proposed that train a patch segmenter model to detect and remove the patch before the image is passed to the downstream model. We unify these approaches with a flexible framework, OODSmoother, which characterizes the properties of approaches that aim to remove adversarial patches. This framework naturally guides us to design 1) a novel adaptive attack that breaks existing patch attack defenses on object detectors, and 2) a novel defense approach SemPrior that takes advantage of semantic priors. Our key insight behind SemPrior is that the existing machine learning-based patch detectors struggle to learn semantic priors and that explicitly incorporating them can improve performance. We find that SemPrior alone provides up to a 40% gain, or up to a 60% gain when combined with existing defenses.
CVJun 1, 2023
CALICO: Self-Supervised Camera-LiDAR Contrastive Pre-training for BEV PerceptionJiachen Sun, Haizhong Zheng, Qingzhao Zhang et al.
Perception is crucial in the realm of autonomous driving systems, where bird's eye view (BEV)-based architectures have recently reached state-of-the-art performance. The desirability of self-supervised representation learning stems from the expensive and laborious process of annotating 2D and 3D data. Although previous research has investigated pretraining methods for both LiDAR and camera-based 3D object detection, a unified pretraining framework for multimodal BEV perception is missing. In this study, we introduce CALICO, a novel framework that applies contrastive objectives to both LiDAR and camera backbones. Specifically, CALICO incorporates two stages: point-region contrast (PRC) and region-aware distillation (RAD). PRC better balances the region- and scene-level representation learning on the LiDAR modality and offers significant performance improvement compared to existing methods. RAD effectively achieves contrastive distillation on our self-trained teacher model. CALICO's efficacy is substantiated by extensive evaluations on 3D object detection and BEV map segmentation tasks, where it delivers significant performance improvements. Notably, CALICO outperforms the baseline method by 10.5% and 8.6% on NDS and mAP. Moreover, CALICO boosts the robustness of multimodal 3D object detection against adversarial attacks and corruption. Additionally, our framework can be tailored to different backbones and heads, positioning it as a promising approach for multimodal BEV perception.
CVOct 11, 2023
Leveraging Hierarchical Feature Sharing for Efficient Dataset CondensationHaizhong Zheng, Jiachen Sun, Shutong Wu et al.
Given a real-world dataset, data condensation (DC) aims to synthesize a small synthetic dataset that captures the knowledge of a natural dataset while being usable for training models with comparable accuracy. Recent works propose to enhance DC with data parameterization, which condenses data into very compact parameterized data containers instead of images. The intuition behind data parameterization is to encode shared features of images to avoid additional storage costs. In this paper, we recognize that images share common features in a hierarchical way due to the inherent hierarchical structure of the classification system, which is overlooked by current data parameterization methods. To better align DC with this hierarchical nature and encourage more efficient information sharing inside data containers, we propose a novel data parameterization architecture, Hierarchical Memory Network (HMN). HMN stores condensed data in a three-tier structure, representing the dataset-level, class-level, and instance-level features. Another helpful property of the hierarchical architecture is that HMN naturally ensures good independence among images despite achieving information sharing. This enables instance-level pruning for HMN to reduce redundant information, thereby further minimizing redundancy and enhancing performance. We evaluate HMN on five public datasets and show that our proposed method outperforms all baselines.
CRFeb 24, 2024Code
PRP: Propagating Universal Perturbations to Attack Large Language Model Guard-RailsNeal Mangaokar, Ashish Hooda, Jihye Choi et al.
Large language models (LLMs) are typically aligned to be harmless to humans. Unfortunately, recent work has shown that such models are susceptible to automated jailbreak attacks that induce them to generate harmful content. More recent LLMs often incorporate an additional layer of defense, a Guard Model, which is a second LLM that is designed to check and moderate the output response of the primary LLM. Our key contribution is to show a novel attack strategy, PRP, that is successful against several open-source (e.g., Llama 2) and closed-source (e.g., GPT 3.5) implementations of Guard Models. PRP leverages a two step prefix-based attack that operates by (a) constructing a universal adversarial prefix for the Guard Model, and (b) propagating this prefix to the response. We find that this procedure is effective across multiple threat models, including ones in which the adversary has no access to the Guard Model at all. Our work suggests that further advances are required on defenses and Guard Models before they can be considered effective.
CVJul 14, 2025Code
Test-Time Canonicalization by Foundation Models for Robust PerceptionUtkarsh Singhal, Ryan Feng, Stella X. Yu et al.
Perception in the real world requires robustness to diverse viewing conditions. Existing approaches often rely on specialized architectures or training with predefined data augmentations, limiting adaptability. Taking inspiration from mental rotation in human vision, we propose FOCAL, a test-time robustness framework that transforms the input into the most typical view. At inference time, FOCAL explores a set of transformed images and chooses the one with the highest likelihood under foundation model priors. This test-time optimization boosts robustness while requiring no retraining or architectural changes. Applied to models like CLIP and SAM, it significantly boosts robustness across a wide range of transformations, including 2D and 3D rotations, contrast and lighting shifts, and day-night changes. We also explore potential applications in active vision. By reframing invariance as a test-time optimization problem, FOCAL offers a general and scalable approach to robustness. Our code is available at: https://github.com/sutkarsh/focal.
CLFeb 9, 2024
Learn To be Efficient: Build Structured Sparsity in Large Language ModelsHaizhong Zheng, Xiaoyan Bai, Xueshen Liu et al.
Large Language Models (LLMs) have achieved remarkable success with their billion-level parameters, yet they incur high inference overheads. The emergence of activation sparsity in LLMs provides a natural approach to reduce this cost by involving only parts of the parameters for inference. However, existing methods only focus on utilizing this naturally formed activation sparsity in a post-training setting, overlooking the potential for further amplifying this inherent sparsity. In this paper, we hypothesize that LLMs can learn to be efficient by achieving more structured activation sparsity. To achieve this, we introduce a novel training algorithm, Learn-To-be-Efficient (LTE), designed to train efficiency-aware LLMs to learn to activate fewer neurons and achieve a better trade-off between sparsity and performance. Furthermore, unlike SOTA MoEfication methods, which mainly focus on ReLU-based models, LTE can also be applied to LLMs like LLaMA using non-ReLU activations. Extensive evaluation on language understanding, language generation, and instruction tuning tasks show that LTE consistently outperforms SOTA baselines. Along with our hardware-aware custom kernel implementation, LTE reduces LLaMA2-7B inference latency by 25% at 50% sparsity.
CLFeb 19, 2024
Plato: Plan to Efficiently Decode for Large Language Model InferenceShuowei Jin, Xueshen Liu, Yongji Wu et al.
Large language models (LLMs) have achieved remarkable success in natural language tasks, but their inference incurs substantial computational and memory overhead. To improve efficiency, parallel decoding methods like Skeleton-of-Thought (SoT) decompose prompts into sub-problems for concurrent processing. However, these methods significantly compromise answer quality by treating semantically linked sub-problems as independent. We propose Plato, a novel approach that co-designs algorithms and systems for semantic-aware parallel decoding. Plato leverages LLMs to organize sub-problems into a dependency graph based on logical and causal relationships, enabling concurrent decoding of non-dependent nodes while preserving answer coherence and quality. To further enhance efficiency, Plato pipelines planning and node decoding stages, implements a global context cache, and carefully structures node inference prompts to maximize key-value cache reuse and minimize overhead. Our evaluations show that Plato improves throughput by 68% over autoregressive decoding while achieving a 40% net win rate in answer quality. Compared to SoT, Plato demonstrates a remarkable 90% quality net-win rate. Ablation studies reveal that our pipeline design improves speedup by 29%, while our KV cache reuse optimization reduces overhead by 75%.
LGJul 15, 2025
Class-Proportional Coreset Selection for Difficulty-Separable DataElisa Tsai, Haizhong Zheng, Atul Prakash
High-quality training data is essential for building reliable and efficient machine learning systems. One-shot coreset selection addresses this by pruning the dataset while maintaining or even improving model performance, often relying on training-dynamics-based data difficulty scores. However, most existing methods implicitly assume class-wise homogeneity in data difficulty, overlooking variation in data difficulty across different classes. In this work, we challenge this assumption by showing that, in domains such as network intrusion detection and medical imaging, data difficulty often clusters by class. We formalize this as class-difficulty separability and introduce the Class Difficulty Separability Coefficient (CDSC) as a quantitative measure. We demonstrate that high CDSC values correlate with performance degradation in class-agnostic coreset methods, which tend to overrepresent easy majority classes while neglecting rare but informative ones. To address this, we introduce class-proportional variants of multiple sampling strategies. Evaluated on five diverse datasets spanning security and medical domains, our methods consistently achieve state-of-the-art performance. For instance, on CTU-13, at an extreme 99% pruning rate, a class-proportional variant of Coverage-centric Coreset Selection (CCS-CP) shows remarkable stability, with accuracy dropping only 2.58%, precision 0.49%, and recall 0.19%. In contrast, the class-agnostic CCS baseline, the next best method, suffers sharper declines of 7.59% in accuracy, 4.57% in precision, and 4.11% in recall. We further show that aggressive pruning enhances generalization in noisy, imbalanced, and large-scale datasets. Our results underscore that explicitly modeling class-difficulty separability leads to more effective, robust, and generalizable data pruning, particularly in high-stakes scenarios.
CRJul 8, 2025
Taming Data Challenges in ML-based Security Tasks: Lessons from Integrating Generative AIShravya Kanchi, Neal Mangaokar, Aravind Cheruvu et al.
Machine learning-based supervised classifiers are widely used for security tasks, and their improvement has been largely focused on algorithmic advancements. We argue that data challenges that negatively impact the performance of these classifiers have received limited attention. We address the following research question: Can developments in Generative AI (GenAI) address these data challenges and improve classifier performance? We propose augmenting training datasets with synthetic data generated using GenAI techniques to improve classifier generalization. We evaluate this approach across 7 diverse security tasks using 6 state-of-the-art GenAI methods and introduce a novel GenAI scheme called Nimai that enables highly controlled data synthesis. We find that GenAI techniques can significantly improve the performance of security classifiers, achieving improvements of up to 32.6% even in severely data-constrained settings (only ~180 training samples). Furthermore, we demonstrate that GenAI can facilitate rapid adaptation to concept drift post-deployment, requiring minimal labeling in the adjustment process. Despite successes, our study finds that some GenAI schemes struggle to initialize (train and produce data) on certain security tasks. We also identify characteristics of specific tasks, such as noisy labels, overlapping class distributions, and sparse feature vectors, which hinder performance boost using GenAI. We believe that our study will drive the development of future GenAI tools designed for security tasks.
CVJun 6, 2024
ELFS: Label-Free Coreset Selection with Proxy Training DynamicsHaizhong Zheng, Elisa Tsai, Yifu Lu et al.
High-quality human-annotated data is crucial for modern deep learning pipelines, yet the human annotation process is both costly and time-consuming. Given a constrained human labeling budget, selecting an informative and representative data subset for labeling can significantly reduce human annotation effort. Well-performing state-of-the-art (SOTA) coreset selection methods require ground truth labels over the whole dataset, failing to reduce the human labeling burden. Meanwhile, SOTA label-free coreset selection methods deliver inferior performance due to poor geometry-based difficulty scores. In this paper, we introduce ELFS (Effective Label-Free Coreset Selection), a novel label-free coreset selection method. ELFS significantly improves label-free coreset selection by addressing two challenges: 1) ELFS utilizes deep clustering to estimate training dynamics-based data difficulty scores without ground truth labels; 2) Pseudo-labels introduce a distribution shift in the data difficulty scores, and we propose a simple but effective double-end pruning method to mitigate bias on calculated scores. We evaluate ELFS on four vision benchmarks and show that, given the same vision encoder, ELFS consistently outperforms SOTA label-free baselines. For instance, when using SwAV as the encoder, ELFS outperforms D2 by up to 10.2% in accuracy on ImageNet-1K. We make our code publicly available on GitHub.
LGFeb 11, 2022
D4: Detection of Adversarial Diffusion Deepfakes Using Disjoint EnsemblesAshish Hooda, Neal Mangaokar, Ryan Feng et al.
Detecting diffusion-generated deepfake images remains an open problem. Current detection methods fail against an adversary who adds imperceptible adversarial perturbations to the deepfake to evade detection. In this work, we propose Disjoint Diffusion Deepfake Detection (D4), a deepfake detector designed to improve black-box adversarial robustness beyond de facto solutions such as adversarial training. D4 uses an ensemble of models over disjoint subsets of the frequency spectrum to significantly improve adversarial robustness. Our key insight is to leverage a redundancy in the frequency domain and apply a saliency partitioning technique to disjointly distribute frequency components across multiple models. We formally prove that these disjoint ensembles lead to a reduction in the dimensionality of the input subspace where adversarial deepfakes lie, thereby making adversarial deepfakes harder to find for black-box attacks. We then empirically validate the D4 method against several black-box attacks and find that D4 significantly outperforms existing state-of-the-art defenses applied to diffusion-generated deepfake detection. We also demonstrate that D4 provides robustness against adversarial deepfakes from unseen data distributions as well as unseen generative techniques.
LGJul 1, 2021
Using Anomaly Feature Vectors for Detecting, Classifying and Warning of Outlier Adversarial ExamplesNelson Manohar-Alers, Ryan Feng, Sahib Singh et al.
We present DeClaW, a system for detecting, classifying, and warning of adversarial inputs presented to a classification neural network. In contrast to current state-of-the-art methods that, given an input, detect whether an input is clean or adversarial, we aim to also identify the types of adversarial attack (e.g., PGD, Carlini-Wagner or clean). To achieve this, we extract statistical profiles, which we term as anomaly feature vectors, from a set of latent features. Preliminary findings suggest that AFVs can help distinguish among several types of adversarial attacks (e.g., PGD versus Carlini-Wagner) with close to 93% accuracy on the CIFAR-10 dataset. The results open the door to using AFV-based methods for exploring not only adversarial attack detection but also classification of the attack type and then design of attack-specific mitigation strategies.
CVDec 3, 2020
Content-Adaptive Pixel Discretization to Improve Model RobustnessRyan Feng, Wu-chi Feng, Atul Prakash
Preprocessing defenses such as pixel discretization are appealing to remove adversarial attacks due to their simplicity. However, they have been shown to be ineffective except on simple datasets like MNIST. We hypothesize that existing discretization approaches failed because using a fixed codebook for the entire dataset limits their ability to balance image representation and codeword separability. We first formally prove that adaptive codebooks can provide stronger robustness guarantees than fixed codebooks as a preprocessing defense on some datasets. Based on that insight, we propose a content-adaptive pixel discretization defense called Essential Features, which discretizes the image to a per-image adaptive codebook to reduce the color space. We then find that Essential Features can be further optimized by applying adaptive blurring before the discretization to push perturbed pixel values back to their original value before determining the codebook. Against adaptive attacks, we show that content-adaptive pixel discretization extends the range of datasets that benefit in terms of both L_2 and L_infinity robustness where previously fixed codebooks were found to have failed. Our findings suggest that content-adaptive pixel discretization should be part of the repertoire for making models robust.
CVJul 17, 2020
Understanding and Diagnosing Vulnerability under Adversarial AttacksHaizhong Zheng, Ziqi Zhang, Honglak Lee et al.
Deep Neural Networks (DNNs) are known to be vulnerable to adversarial attacks. Currently, there is no clear insight into how slight perturbations cause such a large difference in classification results and how we can design a more robust model architecture. In this work, we propose a novel interpretability method, InterpretGAN, to generate explanations for features used for classification in latent variables. Interpreting the classification process of adversarial examples exposes how adversarial perturbations influence features layer by layer as well as which features are modified by perturbations. Moreover, we design the first diagnostic method to quantify the vulnerability contributed by each layer, which can be used to identify vulnerable parts of model architectures. The diagnostic results show that the layers introducing more information loss tend to be more vulnerable than other layers. Based on the findings, our evaluation results on MNIST and CIFAR10 datasets suggest that average pooling layers, with lower information loss, are more robust than max pooling layers for the network architectures studied in this paper.
LGMay 8, 2020
Towards Robustness against Unsuspicious Adversarial ExamplesLiang Tong, Minzhe Guo, Atul Prakash et al.
Despite the remarkable success of deep neural networks, significant concerns have emerged about their robustness to adversarial perturbations to inputs. While most attacks aim to ensure that these are imperceptible, physical perturbation attacks typically aim for being unsuspicious, even if perceptible. However, there is no universal notion of what it means for adversarial examples to be unsuspicious. We propose an approach for modeling suspiciousness by leveraging cognitive salience. Specifically, we split an image into foreground (salient region) and background (the rest), and allow significantly larger adversarial perturbations in the background, while ensuring that cognitive salience of background remains low. We describe how to compute the resulting non-salience-preserving dual-perturbation attacks on classifiers. We then experimentally demonstrate that our attacks indeed do not significantly change perceptual salience of the background, but are highly effective against classifiers robust to conventional attacks. Furthermore, we show that adversarial training with dual-perturbation attacks yields classifiers that are more robust to these than state-of-the-art robust learning approaches, and comparable in terms of robustness to conventional attacks.
MLMay 6, 2020
MAZE: Data-Free Model Stealing Attack Using Zeroth-Order Gradient EstimationSanjay Kariyappa, Atul Prakash, Moinuddin Qureshi
Model Stealing (MS) attacks allow an adversary with black-box access to a Machine Learning model to replicate its functionality, compromising the confidentiality of the model. Such attacks train a clone model by using the predictions of the target model for different inputs. The effectiveness of such attacks relies heavily on the availability of data necessary to query the target model. Existing attacks either assume partial access to the dataset of the target model or availability of an alternate dataset with semantic similarities. This paper proposes MAZE -- a data-free model stealing attack using zeroth-order gradient estimation. In contrast to prior works, MAZE does not require any data and instead creates synthetic data using a generative model. Inspired by recent works in data-free Knowledge Distillation (KD), we train the generative model using a disagreement objective to produce inputs that maximize disagreement between the clone and the target model. However, unlike the white-box setting of KD, where the gradient information is available, training a generator for model stealing requires performing black-box optimization, as it involves accessing the target model under attack. MAZE relies on zeroth-order gradient estimation to perform this optimization and enables a highly accurate MS attack. Our evaluation with four datasets shows that MAZE provides a normalized clone accuracy in the range of 0.91x to 0.99x, and outperforms even the recent attacks that rely on partial data (JBDA, clone accuracy 0.13x to 0.69x) and surrogate data (KnockoffNets, clone accuracy 0.52x to 0.97x). We also study an extension of MAZE in the partial-data setting and develop MAZE-PD, which generates synthetic data closer to the target distribution. MAZE-PD further improves the clone accuracy (0.97x to 1.0x) and reduces the query required for the attack by 2x-24x.
CRFeb 17, 2020
GRAPHITE: Generating Automatic Physical Examples for Machine-Learning Attacks on Computer Vision SystemsRyan Feng, Neal Mangaokar, Jiefeng Chen et al.
This paper investigates an adversary's ease of attack in generating adversarial examples for real-world scenarios. We address three key requirements for practical attacks for the real-world: 1) automatically constraining the size and shape of the attack so it can be applied with stickers, 2) transform-robustness, i.e., robustness of a attack to environmental physical variations such as viewpoint and lighting changes, and 3) supporting attacks in not only white-box, but also black-box hard-label scenarios, so that the adversary can attack proprietary models. In this work, we propose GRAPHITE, an efficient and general framework for generating attacks that satisfy the above three key requirements. GRAPHITE takes advantage of transform-robustness, a metric based on expectation over transforms (EoT), to automatically generate small masks and optimize with gradient-free optimization. GRAPHITE is also flexible as it can easily trade-off transform-robustness, perturbation size, and query count in black-box settings. On a GTSRB model in a hard-label black-box setting, we are able to find attacks on all possible 1,806 victim-target class pairs with averages of 77.8% transform-robustness, perturbation size of 16.63% of the victim images, and 126K queries per pair. For digital-only attacks where achieving transform-robustness is not a requirement, GRAPHITE is able to find successful small-patch attacks with an average of only 566 queries for 92.2% of victim-target pairs. GRAPHITE is also able to find successful attacks using perturbations that modify small areas of the input image against PatchGuard, a recently proposed defense against patch-based attacks.
LGDec 27, 2019
Efficient Adversarial Training with Transferable Adversarial ExamplesHaizhong Zheng, Ziqi Zhang, Juncheng Gu et al.
Adversarial training is an effective defense method to protect classification models against adversarial attacks. However, one limitation of this approach is that it can require orders of magnitude additional training time due to high cost of generating strong adversarial examples during training. In this paper, we first show that there is high transferability between models from neighboring epochs in the same training process, i.e., adversarial examples from one epoch continue to be adversarial in subsequent epochs. Leveraging this property, we propose a novel method, Adversarial Training with Transferable Adversarial Examples (ATTA), that can enhance the robustness of trained models and greatly improve the training efficiency by accumulating adversarial perturbations through epochs. Compared to state-of-the-art adversarial training methods, ATTA enhances adversarial accuracy by up to 7.2% on CIFAR10 and requires 12~14x less training time on MNIST and CIFAR10 datasets with comparable model robustness.
CVNov 27, 2019
Can Attention Masks Improve Adversarial Robustness?Pratik Vaishnavi, Tianji Cong, Kevin Eykholt et al.
Deep Neural Networks (DNNs) are known to be susceptible to adversarial examples. Adversarial examples are maliciously crafted inputs that are designed to fool a model, but appear normal to human beings. Recent work has shown that pixel discretization can be used to make classifiers for MNIST highly robust to adversarial examples. However, pixel discretization fails to provide significant protection on more complex datasets. In this paper, we take the first step towards reconciling these contrary findings. Focusing on the observation that discrete pixelization in MNIST makes the background completely black and foreground completely white, we hypothesize that the important property for increasing robustness is the elimination of image background using attention masks before classifying an object. To examine this hypothesis, we create foreground attention masks for two different datasets, GTSRB and MS-COCO. Our initial results suggest that using attention mask leads to improved robustness. On the adversarially trained classifiers, we see an adversarial robustness increase of over 20% on MS-COCO.
CVSep 12, 2019
Towards Model-Agnostic Adversarial Defenses using Adversarially Trained AutoencodersPratik Vaishnavi, Kevin Eykholt, Atul Prakash et al.
Adversarial machine learning is a well-studied field of research where an adversary causes predictable errors in a machine learning algorithm through precise manipulation of the input. Numerous techniques have been proposed to harden machine learning algorithms and mitigate the effect of adversarial attacks. Of these techniques, adversarial training, which augments the training data with adversarial samples, has proven to be an effective defense with respect to a certain class of attacks. However, adversarial training is computationally expensive and its improvements are limited to a single model. In this work, we take a first step toward creating a model-agnostic adversarial defense. We propose Adversarially-Trained Autoencoder Augmentation (AAA), the first model-agnostic adversarial defense that is robust against certain adaptive adversaries. We show that AAA allows us to achieve a partially model-agnostic defense by training a single autoencoder to protect multiple pre-trained classifiers; achieving adversarial performance on par or better than adversarial training without modifying the classifiers. Furthermore, we demonstrate that AAA can be used to create a fully model-agnostic defense for MNIST and Fashion MNIST datasets by improving the adversarial performance of a never before seen pre-trained classifier by at least 45% with no additional training. Finally, using a natural image corruption dataset, we show that our approach improves robustness to naturally corrupted images,which has been identified as strongly indicative of true adversarial robustness.
LGMay 27, 2019
Analyzing the Interpretability Robustness of Self-Explaining ModelsHaizhong Zheng, Earlence Fernandes, Atul Prakash
Recently, interpretable models called self-explaining models (SEMs) have been proposed with the goal of providing interpretability robustness. We evaluate the interpretability robustness of SEMs and show that explanations provided by SEMs as currently proposed are not robust to adversarial inputs. Specifically, we successfully created adversarial inputs that do not change the model outputs but cause significant changes in the explanations. We find that even though current SEMs use stable co-efficients for mapping explanations to output labels, they do not consider the robustness of the first stage of the model that creates interpretable basis concepts from the input, leading to non-robust explanations. Our work makes a case for future work to start examining how to generate interpretable basis concepts in a robust way.
LGMay 26, 2019
Robust Classification using Robust Feature AugmentationKevin Eykholt, Swati Gupta, Atul Prakash et al.
Existing deep neural networks, say for image classification, have been shown to be vulnerable to adversarial images that can cause a DNN misclassification, without any perceptible change to an image. In this work, we propose shock absorbing robust features such as binarization, e.g., rounding, and group extraction, e.g., color or shape, to augment the classification pipeline, resulting in more robust classifiers. Experimentally, we show that augmenting ML models with these techniques leads to improved overall robustness on adversarial inputs as well as significant improvements in training time. On the MNIST dataset, we achieved 14x speedup in training time to obtain 90% adversarial accuracy com-pared to the state-of-the-art adversarial training method of Madry et al., as well as retained higher adversarial accuracy over a broader range of attacks. We also find robustness improvements on traffic sign classification using robust feature augmentation. Finally, we give theoretical insights for why one can expect robust feature augmentation to reduce adversarial input space
CRJan 29, 2019
Beware of the App! On the Vulnerability Surface of Smart Devices through their Companion AppsDavino Mauro Junior, Luis Melo, Harvey Lu et al.
Internet of Things (IoT) devices are becoming increasingly important. These devices are often resource-limited, hindering rigorous enforcement of security policies. Assessing the vulnerability of IoT devices is an important problem, but analyzing their firmware is difficult for a variety of reasons, including requiring the purchase of devices. This paper finds that analyzing companion apps to these devices for clues to security vulnerabilities can be an effective strategy. Compared to device hardware and firmware, these apps are easy to download and analyze. A key finding of this study is that the communication between an IoT device and its app is often not properly encrypted and authenticated and these issues enable the construction of exploits to remotely control the devices. To confirm the vulnerabilities found, we created exploits against five popular IoT devices from Amazon by using a combination of static and dynamic analyses. We also did a larger study, finding that analyzing 96 popular IoT devices only required analyzing 32 companion apps. Among the conservative findings, 50% of the apps corresponding to 38% of the devices did not use proper encryption techniques to secure device to companion app communication. Finally, we discuss defense strategies that developers can adapt to address the lessons from our work.
LGDec 17, 2018
Designing Adversarially Resilient Classifiers using Resilient Feature EngineeringKevin Eykholt, Atul Prakash
We provide a methodology, resilient feature engineering, for creating adversarially resilient classifiers. According to existing work, adversarial attacks identify weakly correlated or non-predictive features learned by the classifier during training and design the adversarial noise to utilize these features. Therefore, highly predictive features should be used first during classification in order to determine the set of possible output labels. Our methodology focuses the problem of designing resilient classifiers into a problem of designing resilient feature extractors for these highly predictive features. We provide two theorems, which support our methodology. The Serial Composition Resilience and Parallel Composition Resilience theorems show that the output of adversarially resilient feature extractors can be combined to create an equally resilient classifier. Based on our theoretical results, we outline the design of an adversarially resilient classifier.
CROct 31, 2018
Securing IoT Apps with Fine-grained Control of Information FlowsDavino Mauro Junior, Kiev Gama, Atul Prakash
Internet of Things is growing rapidly, with many connected devices now available to consumers. With this growth, the IoT apps that manage the devices from smartphones raise significant security concerns. Typically, these apps are secured via sensitive credentials such as email and password that need to be validated through specific servers, thus requiring permissions to access the Internet. Unfortunately, even when developers are well-intentioned, such apps can be non-trivial to secure so as to guarantee that user's credentials do not leak to unauthorized servers on the Internet. For example, if the app relies on third-party libraries, as many do, those libraries can potentially capture and leak sensitive credentials. Bugs in the applications can also result in exploitable vulnerabilities that leak credentials. This paper presents our work in-progress on a prototype that enables developers to control how information flows within the app from sensitive UI data to specific servers. We extend FlowFence to enforce fine-grained information flow policies on sensitive UI data.
CRJul 20, 2018
Physical Adversarial Examples for Object DetectorsKevin Eykholt, Ivan Evtimov, Earlence Fernandes et al.
Deep neural networks (DNNs) are vulnerable to adversarial examples-maliciously crafted inputs that cause DNNs to make incorrect predictions. Recent work has shown that these attacks generalize to the physical domain, to create perturbations on physical objects that fool image classifiers under a variety of real-world conditions. Such attacks pose a risk to deep learning models used in safety-critical cyber-physical systems. In this work, we extend physical attacks to more challenging object detection models, a broader class of deep learning algorithms widely used to detect and label multiple objects within a scene. Improving upon a previous physical attack on image classifiers, we create perturbed physical objects that are either ignored or mislabeled by object detection models. We implement a Disappearance Attack, in which we cause a Stop sign to "disappear" according to the detector-either by covering thesign with an adversarial Stop sign poster, or by adding adversarial stickers onto the sign. In a video recorded in a controlled lab environment, the state-of-the-art YOLOv2 detector failed to recognize these adversarial Stop signs in over 85% of the video frames. In an outdoor experiment, YOLO was fooled by the poster and sticker attacks in 72.5% and 63.5% of the video frames respectively. We also use Faster R-CNN, a different object detection model, to demonstrate the transferability of our adversarial perturbations. The created poster perturbation is able to fool Faster R-CNN in 85.9% of the video frames in a controlled lab environment, and 40.2% of the video frames in an outdoor environment. Finally, we present preliminary results with a new Creation Attack, where in innocuous physical stickers fool a model into detecting nonexistent objects.
CRJan 14, 2018
Tyche: Risk-Based Permissions for Smart Home PlatformsAmir Rahmati, Earlence Fernandes, Kevin Eykholt et al.
Emerging smart home platforms, which interface with a variety of physical devices and support third-party application development, currently use permission models inspired by smartphone operating systems-they group functionally similar device operations into separate units, and require users to grant apps access to devices at that granularity. Unfortunately, this leads to two issues: (1) apps that do not require access to all of the granted device operations have overprivileged access to them, (2) apps might pose a higher risk to users than needed because physical device operations are fundamentally risk-asymmetric-"door.unlock" provides access to burglars, and "door.lock" can potentially lead to getting locked out. Overprivileged apps with access to mixed-risk operations only increase the potential for damage. We present Tyche, a system that leverages the risk-asymmetry in physical device operations to limit the risk that apps pose to smart home users, without increasing the user's decision overhead. Tyche introduces the notion of risk-based permissions. When using risk-based permissions, device operations are grouped into units of similar risk, and users grant apps access to devices at that risk-based granularity. Starting from a set of permissions derived from the popular Samsung SmartThings platform, we conduct a user study involving domain-experts and Mechanical Turk users to compute a relative ranking of risks associated with device operations. We find that user assessment of risk closely matches that of domain experts. Using this ranking, we define risk-based groupings of device operations, and apply it to existing SmartThings apps, showing that risk-based permissions indeed limit risk if apps are malicious or exploitable.
CRDec 21, 2017
Note on Attacking Object Detectors with Adversarial StickersKevin Eykholt, Ivan Evtimov, Earlence Fernandes et al.
Deep learning has proven to be a powerful tool for computer vision and has seen widespread adoption for numerous tasks. However, deep learning algorithms are known to be vulnerable to adversarial examples. These adversarial inputs are created such that, when provided to a deep learning algorithm, they are very likely to be mislabeled. This can be problematic when deep learning is used to assist in safety critical decisions. Recent research has shown that classifiers can be attacked by physical adversarial examples under various physical conditions. Given the fact that state-of-the-art objection detection algorithms are harder to be fooled by the same set of adversarial examples, here we show that these detectors can also be attacked by physical adversarial examples. In this note, we briefly show both static and dynamic test results. We design an algorithm that produces physical adversarial inputs, which can fool the YOLO object detector and can also attack Faster-RCNN with relatively high success rate based on transferability. Furthermore, our algorithm can compress the size of the adversarial inputs to stickers that, when attached to the targeted object, result in the detector either mislabeling or not detecting the object a high percentage of the time. This note provides a small set of results. Our upcoming paper will contain a thorough evaluation on other object detectors, and will present the algorithm.
CRSep 8, 2017
IFTTT vs. Zapier: A Comparative Study of Trigger-Action Programming FrameworksAmir Rahmati, Earlence Fernandes, Jaeyeon Jung et al.
The growing popularity of online services and IoT platforms along with increased developer's access to devices and services through RESTful APIs is giving rise to a new class of frameworks that support trigger-action programming. These frameworks provide an interface for end-users to bridge different RESTful APIs in a trigger-action model and easily create automated tasks across diverse platforms. Past work has characterized the space of user-created trigger-action combinations in the context of IFTTT, a popular trigger-action framework. In this work, we characterize the space of possible functionality that such frameworks open up to end-users in the context of two major frameworks -IFTTT and Zapier- and discuss results from our comparative analysis of these frameworks. We create a snapshot of 6406 triggers and actions from 1051 channels/apps across these two frameworks and compare the available functions, distribution of channels, and functions shared between them. We examine user's ability to define their own channels, triggers, and actions; analyze the growth of these frameworks; and discuss future research opportunities in this domain.
CRJul 27, 2017
Robust Physical-World Attacks on Deep Learning ModelsKevin Eykholt, Ivan Evtimov, Earlence Fernandes et al.
Recent studies show that the state-of-the-art deep neural networks (DNNs) are vulnerable to adversarial examples, resulting from small-magnitude perturbations added to the input. Given that that emerging physical systems are using DNNs in safety-critical situations, adversarial examples could mislead these systems and cause dangerous situations.Therefore, understanding adversarial examples in the physical world is an important step towards developing resilient learning algorithms. We propose a general attack algorithm,Robust Physical Perturbations (RP2), to generate robust visual adversarial perturbations under different physical conditions. Using the real-world case of road sign classification, we show that adversarial examples generated using RP2 achieve high targeted misclassification rates against standard-architecture road sign classifiers in the physical world under various environmental conditions, including viewpoints. Due to the current lack of a standardized testing method, we propose a two-stage evaluation methodology for robust physical adversarial examples consisting of lab and field tests. Using this methodology, we evaluate the efficacy of physical adversarial manipulations on real objects. Witha perturbation in the form of only black and white stickers,we attack a real stop sign, causing targeted misclassification in 100% of the images obtained in lab settings, and in 84.8%of the captured video frames obtained on a moving vehicle(field test) for the target classifier.
CRJul 3, 2017
Decoupled-IFTTT: Constraining Privilege in Trigger-Action Platforms for the Internet of ThingsEarlence Fernandes, Amir Rahmati, Jaeyeon Jung et al.
Trigger-Action platforms are an emerging class of web-based systems that enable users to create automation rules (or recipes) of the form, "If there is a smoke alarm, then turn off my oven." These platforms stitch together various online services including Internet of Things devices, social networks, and productivity tools by obtaining OAuth tokens on behalf of users. Unfortunately, these platforms also introduce a long-term security risk: If they are compromised, the attacker can misuse the OAuth tokens belonging to millions of users to arbitrarily manipulate their devices and data. In this work, we first quantify the risk users face in the context of If-This-Then-That (IFTTT). We perform the first empirical analysis of the OAuth-based authorization model of IFTTT using semi-automated tools that we built to overcome the challenges of IFTTT's closed source nature and of online service API inconsistencies. We find that 75% of IFTTT's channels, an abstraction of online services, use overprivileged OAuth tokens, increasing risks in the event of a compromise. Even if the OAuth tokens were to be privileged correctly, IFTTT's compromise will not prevent their misuse. Motivated by this empirical analysis, we design and evaluate Decoupled-IFTTT (dIFTTT), the first trigger-action platform where users do not have to give it highly-privileged access to their online services. Our design pushes the notion of fine-grained OAuth tokens to its extreme and ensures that even if the cloud service is controlled by the attacker, it cannot misuse the OAuth tokens to invoke unauthorized actions. Our evaluation establishes that dIFTTT poses modest overhead: it adds less than 15ms of latency to recipe execution time, and reduces throughput by 2.5%.
CRMay 23, 2017
Internet of Things Security Research: A Rehash of Old Ideas or New Intellectual Challenges?Earlence Fernandes, Amir Rahmati, Kevin Eykholt et al.
The Internet of Things (IoT) is a new computing paradigm that spans wearable devices, homes, hospitals, cities, transportation, and critical infrastructure. Building security into this new computing paradigm is a major technical challenge today. However, what are the security problems in IoT that we can solve using existing security principles? And, what are the new problems and challenges in this space that require new security mechanisms? This article summarizes the intellectual similarities and differences between classic information technology security research and IoT security research.
CRJan 27, 2014
Anception: Application Virtualization For AndroidEarlence Fernandes, Alexander Crowell, Ajit Aluri et al.
The problem of malware has become significant on Android devices. Library operating systems and application virtualization are both possible solutions for confining malware. Unfortunately, such solutions do not exist for Android. Designing mechanisms for application virtualization is a significant chal- lenge for several reasons: (1) graphics performance is important due to popularity of games and (2) applications with the same UID can share state. This paper presents Anception, the first flexible application virtualization framework for Android. It is imple- mented as a modification to the Android kernel and supports application virtualization that addresses the above requirements. Anception is able to confine many types of malware while supporting unmodified Android applications. Our Anception- based system exhibits up to 3.9% overhead on various 2D/3D benchmarks, and 1.8% overhead on the SunSpider benchmark.