Ali Shafahi

LG
h-index26
17papers
3,873citations
Novelty55%
AI Score41

17 Papers

LGSep 30, 2022
Improving Robustness with Adaptive Weight Decay

Amin Ghiasi, Ali Shafahi, Reza Ardekani

We propose adaptive weight decay, which automatically tunes the hyper-parameter for weight decay during each training iteration. For classification problems, we propose changing the value of the weight decay hyper-parameter on the fly based on the strength of updates from the classification loss (i.e., gradient of cross-entropy), and the regularization loss (i.e., $\ell_2$-norm of the weights). We show that this simple modification can result in large improvements in adversarial robustness -- an area which suffers from robust overfitting -- without requiring extra data across various datasets and architecture choices. For example, our reformulation results in $20\%$ relative robustness improvement for CIFAR-100, and $10\%$ relative robustness improvement on CIFAR-10 comparing to the best tuned hyper-parameters of traditional weight decay resulting in models that have comparable performance to SOTA robustness methods. In addition, this method has other desirable properties, such as less sensitivity to learning rate, and smaller weight norms, which the latter contributes to robustness to overfitting to label noise, and pruning.

LGApr 29, 2019Code
Adversarial Training for Free!

Ali Shafahi, Mahyar Najibi, Amin Ghiasi et al.

Adversarial training, in which a network is trained on adversarial examples, is one of the few defenses against adversarial attacks that withstands strong attacks. Unfortunately, the high cost of generating strong adversarial examples makes standard adversarial training impractical on large-scale problems like ImageNet. We present an algorithm that eliminates the overhead cost of generating adversarial examples by recycling the gradient information computed when updating model parameters. Our "free" adversarial training algorithm achieves comparable robustness to PGD adversarial training on the CIFAR-10 and CIFAR-100 datasets at negligible additional cost compared to natural training, and can be 7 to 30 times faster than other strong adversarial training methods. Using a single workstation with 4 P100 GPUs and 2 days of runtime, we can train a robust model for the large-scale ImageNet classification task that maintains 40% accuracy against PGD attacks. The code is available at https://github.com/ashafahi/free_adv_train.

CVAug 13, 2025
Stable Diffusion Models are Secretly Good at Visual In-Context Learning

Trevine Oorloff, Vishwanath Sindagi, Wele Gedara Chaminda Bandara et al.

Large language models (LLM) in natural language processing (NLP) have demonstrated great potential for in-context learning (ICL) -- the ability to leverage a few sets of example prompts to adapt to various tasks without having to explicitly update the model weights. ICL has recently been explored for computer vision tasks with promising early outcomes. These approaches involve specialized training and/or additional data that complicate the process and limit its generalizability. In this work, we show that off-the-shelf Stable Diffusion models can be repurposed for visual in-context learning (V-ICL). Specifically, we formulate an in-place attention re-computation within the self-attention layers of the Stable Diffusion architecture that explicitly incorporates context between the query and example prompts. Without any additional fine-tuning, we show that this repurposed Stable Diffusion model is able to adapt to six different tasks: foreground segmentation, single object detection, semantic segmentation, keypoint detection, edge detection, and colorization. For example, the proposed approach improves the mean intersection over union (mIoU) for the foreground segmentation task on Pascal-5i dataset by 8.9% and 3.2% over recent methods such as Visual Prompting and IMProv, respectively. Additionally, we show that the proposed method is able to effectively leverage multiple prompts through ensembling to infer the task better and further improve the performance.

LGOct 14, 2020
Towards Accurate Quantization and Pruning via Data-free Knowledge Transfer

Chen Zhu, Zheng Xu, Ali Shafahi et al.

When large scale training data is available, one can obtain compact and accurate networks to be deployed in resource-constrained environments effectively through quantization and pruning. However, training data are often protected due to privacy concerns and it is challenging to obtain compact networks without data. We study data-free quantization and pruning by transferring knowledge from trained large networks to compact networks. Auxiliary generators are simultaneously and adversarially trained with the targeted compact networks to generate synthetic inputs that maximize the discrepancy between the given large network and its quantized or pruned version. We show theoretically that the alternating optimization for the underlying minimax problem converges under mild conditions for pruning and quantization. Our data-free compact networks achieve competitive accuracy to networks trained and fine-tuned with training data. Our quantized and pruned networks achieve good performance while being more compact and lightweight. Further, we demonstrate that the compact structure and corresponding initialization from the Lottery Ticket Hypothesis can also help in data-free training.

LGMay 30, 2020
Exploring Model Robustness with Adaptive Networks and Improved Adversarial Training

Zheng Xu, Ali Shafahi, Tom Goldstein

Adversarial training has proven to be effective in hardening networks against adversarial examples. However, the gained robustness is limited by network capacity and number of training samples. Consequently, to build more robust models, it is common practice to train on widened networks with more parameters. To boost robustness, we propose a conditional normalization module to adapt networks when conditioned on input samples. Our adaptive networks, once adversarially trained, can outperform their non-adaptive counterparts on both clean validation accuracy and robustness. Our method is objective agnostic and consistently improves both the conventional adversarial training objective and the TRADES objective. Our adaptive networks also outperform larger widened non-adaptive architectures that have 1.5 times more parameters. We further introduce several practical ``tricks'' in adversarial training to improve robustness and empirically verify their efficiency.

LGMar 19, 2020
Breaking certified defenses: Semantic adversarial examples with spoofed robustness certificates

Amin Ghiasi, Ali Shafahi, Tom Goldstein

To deflect adversarial attacks, a range of "certified" classifiers have been proposed. In addition to labeling an image, certified classifiers produce (when possible) a certificate guaranteeing that the input image is not an $\ell_p$-bounded adversarial example. We present a new attack that exploits not only the labelling function of a classifier, but also the certificate generator. The proposed method applies large perturbations that place images far from a class boundary while maintaining the imperceptibility property of adversarial examples. The proposed "Shadow Attack" causes certifiably robust networks to mislabel an image and simultaneously produce a "spoofed" certificate of robustness.

LGNov 18, 2019
WITCHcraft: Efficient PGD attacks with random step size

Ping-Yeh Chiang, Jonas Geiping, Micah Goldblum et al.

State-of-the-art adversarial attacks on neural networks use expensive iterative methods and numerous random restarts from different initial points. Iterative FGSM-based methods without restarts trade off performance for computational efficiency because they do not adequately explore the image space and are highly sensitive to the choice of step size. We propose a variant of Projected Gradient Descent (PGD) that uses a random step size to improve performance without resorting to expensive random restarts. Our method, Wide Iterative Stochastic crafting (WITCHcraft), achieves results superior to the classical PGD attack on the CIFAR-10 and MNIST data sets but without additional computational cost. This simple modification of PGD makes crafting attacks more economical, which is important in situations like adversarial training where attacks need to be crafted in real time.

LGOct 25, 2019
Label Smoothing and Logit Squeezing: A Replacement for Adversarial Training?

Ali Shafahi, Amin Ghiasi, Furong Huang et al.

Adversarial training is one of the strongest defenses against adversarial attacks, but it requires adversarial examples to be generated for every mini-batch during optimization. The expense of producing these examples during training often precludes adversarial training from use on complex image datasets. In this study, we explore the mechanisms by which adversarial training improves classifier robustness, and show that these mechanisms can be effectively mimicked using simple regularization methods, including label smoothing and logit squeezing. Remarkably, using these simple regularization methods in combination with Gaussian noise injection, we are able to achieve strong adversarial robustness -- often exceeding that of adversarial training -- using no adversarial examples.

LGJun 17, 2019
Adversarial attacks on Copyright Detection Systems

Parsa Saadatpanah, Ali Shafahi, Tom Goldstein

It is well-known that many machine learning models are susceptible to adversarial attacks, in which an attacker evades a classifier by making small perturbations to inputs. This paper discusses how industrial copyright detection tools, which serve a central role on the web, are susceptible to adversarial attacks. We discuss a range of copyright detection systems, and why they are particularly vulnerable to attacks. These vulnerabilities are especially apparent for neural network based systems. As a proof of concept, we describe a well-known music identification method, and implement this system in the form of a neural net. We then attack this system using simple gradient methods. Adversarial music created this way successfully fools industrial systems, including the AudioTag copyright detector and YouTube's Content ID system. Our goal is to raise awareness of the threats posed by adversarial examples in this space, and to highlight the importance of hardening copyright detection systems to attacks.

LGMay 20, 2019
Adversarially robust transfer learning

Ali Shafahi, Parsa Saadatpanah, Chen Zhu et al.

Transfer learning, in which a network is trained on one task and re-purposed on another, is often used to produce neural network classifiers when data is scarce or full-scale training is too costly. When the goal is to produce a model that is not only accurate but also adversarially robust, data scarcity and computational limitations become even more cumbersome. We consider robust transfer learning, in which we transfer not only performance but also robustness from a source model to a target domain. We start by observing that robust networks contain robust feature extractors. By training classifiers on top of these feature extractors, we produce new models that inherit the robustness of their parent networks. We then consider the case of fine tuning a network by re-training end-to-end in the target domain. When using lifelong learning strategies, this process preserves the robustness of the source network while achieving high accuracy. By using such strategies, it is possible to produce accurate and robust models with little data, and without the cost of adversarial training. Additionally, we can improve the generalization of adversarially trained models, while maintaining their robustness.

MLMay 15, 2019
Transferable Clean-Label Poisoning Attacks on Deep Neural Nets

Chen Zhu, W. Ronny Huang, Ali Shafahi et al.

Clean-label poisoning attacks inject innocuous looking (and "correctly" labeled) poison images into training data, causing a model to misclassify a targeted image after being trained on this data. We consider transferable poisoning attacks that succeed without access to the victim network's outputs, architecture, or (in some cases) training data. To achieve this, we propose a new "polytope attack" in which poison images are designed to surround the targeted image in feature space. We also demonstrate that using Dropout during poison creation helps to enhance transferability of this attack. We achieve transferable attack success rates of over 50% while poisoning only 1% of the training set.

CVNov 27, 2018
Universal Adversarial Training

Ali Shafahi, Mahyar Najibi, Zheng Xu et al.

Standard adversarial attacks change the predicted class label of a selected image by adding specially tailored small perturbations to its pixels. In contrast, a universal perturbation is an update that can be added to any image in a broad class of images, while still changing the predicted class label. We study the efficient generation of universal adversarial perturbations, and also efficient methods for hardening networks to these attacks. We propose a simple optimization-based universal attack that reduces the top-1 accuracy of various network architectures on ImageNet to less than 20%, while learning the universal perturbation 13X faster than the standard method. To defend against these perturbations, we propose universal adversarial training, which models the problem of robust classifier generation as a two-player min-max game, and produces robust models with only 2X the cost of natural training. We also propose a simultaneous stochastic gradient method that is almost free of extra computation, which allows us to do universal adversarial training on ImageNet.

LGSep 6, 2018
Are adversarial examples inevitable?

Ali Shafahi, W. Ronny Huang, Christoph Studer et al.

A wide range of defenses have been proposed to harden neural networks against adversarial attacks. However, a pattern has emerged in which the majority of adversarial defenses are quickly broken by new attacks. Given the lack of success at generating robust defenses, we are led to ask a fundamental question: Are adversarial attacks inevitable? This paper analyzes adversarial examples from a theoretical perspective, and identifies fundamental bounds on the susceptibility of a classifier to adversarial attacks. We show that, for certain classes of problems, adversarial examples are inescapable. Using experiments, we explore the implications of theoretical guarantees for real-world problems and discuss how factors such as dimensionality and image complexity limit a classifier's robustness against adversarial examples.

LGApr 3, 2018
Poison Frogs! Targeted Clean-Label Poisoning Attacks on Neural Networks

Ali Shafahi, W. Ronny Huang, Mahyar Najibi et al.

Data poisoning is an attack on machine learning models wherein the attacker adds examples to the training set to manipulate the behavior of the model at test time. This paper explores poisoning attacks on neural nets. The proposed attacks use "clean-labels"; they don't require the attacker to have any control over the labeling of training data. They are also targeted; they control the behavior of the classifier on a $\textit{specific}$ test instance without degrading overall classifier performance. For example, an attacker could add a seemingly innocuous image (that is properly labeled) to a training set for a face recognition engine, and control the identity of a chosen person at test time. Because the attacker does not need to control the labeling function, poisons could be entered into the training set simply by leaving them on the web and waiting for them to be scraped by a data collection bot. We present an optimization-based method for crafting poisons, and show that just one single poison image can control classifier behavior when transfer learning is used. For full end-to-end training, we present a "watermarking" strategy that makes poisoning reliable using multiple ($\approx$50) poisoned training instances. We demonstrate our method by generating poisoned frog images from the CIFAR dataset and using them to manipulate image classifiers.

OCNov 1, 2017
SCDA: School Compatibility Decomposition Algorithm for Solving the Multi-School Bus Routing and Scheduling Problem

Zhongxiang Wang, Ali Shafahi, Ali Haghani

Safely serving the school transportation demand with the minimum number of buses is one of the highest financial goals of school transportation directors. To achieve that objective, a good and efficient way to solve the routing and scheduling problem is required. Due to the growth of the computing power, the spotlight has been shed on solving the combined problem of the school bus routing and scheduling problem. We show that an integrated multi-school bus routing and scheduling can be formulated with the help of trip compatibility. A novel decomposition algorithm is proposed to solve the integrated model. The merit of this integrated model and the decomposition method is that with the consideration of the trip compatibility, the interrelationship between the routing and scheduling sub-problems will not be lost in the process of decomposition. Results show the proposed decomposed problem could provide the solutions using the same number of buses as the integrated model in much shorter time (as little as 0.6%) and that the proposed method can save up to 26% number of buses from existing research.

OCNov 1, 2017
School bus routing by maximizing trip compatibility

Ali Shafahi, Zhongxiang Wang, Ali Haghani

School bus planning is usually divided into routing and scheduling due to the complexity of solving them concurrently. However, the separation between these two steps may lead to worse solutions with higher overall costs than that from solving them together. When finding the minimal number of trips in the routing problem, neglecting the importance of trip compatibility may increase the number of buses actually needed in the scheduling problem. This paper proposes a new formulation for the multi-school homogeneous fleet routing problem that maximizes trip compatibility while minimizing total travel time. This incorporates the trip compatibility for the scheduling problem in the routing problem. Since the problem is inherently just a routing problem, finding a good solution is not cumbersome. To compare the performance of the model with traditional routing problems, we generate eight mid-size data sets. Through importing the generated trips of the routing problems into the bus scheduling (blocking) problem, it is shown that the proposed model uses up to 13% fewer buses than the common traditional routing models.

AIAug 14, 2017
Understanding and Visualizing the District of Columbia Capital Bikeshare System Using Data Analysis for Balancing Purposes

Kiana Roshan Zamir, Ali Shafahi, Ali Haghani

Bike sharing systems' popularity has consistently been rising during the past years. Managing and maintaining these emerging systems are indispensable parts of these systems. Visualizing the current operations can assist in getting a better grasp on the performance of the system. In this paper, a data mining approach is used to identify and visualize some important factors related to bike-share operations and management. To consolidate the data, we cluster stations that have a similar pickup and drop-off profiles during weekdays and weekends. We provide the temporal profile of the center of each cluster which can be used as a simple and practical approach for approximating the number of pickups and drop-offs of the stations. We also define two indices based on stations' shortages and surpluses that reflect the degree of balancing aid a station needs. These indices can help stakeholders improve the quality of the bike-share user experience in at-least two ways. It can act as a complement to balancing optimization efforts, and it can identify stations that need expansion. We mine the District of Columbia's regional bike-share data and discuss the findings of this data set. We examine the bike-share system during different quarters of the year and during both peak and non-peak hours. Findings reflect that on weekdays most of the pickups and drop-offs happen during the morning and evening peaks whereas on weekends pickups and drop-offs are spread out throughout the day. We also show that throughout the day, more than 40% of the stations are relatively self-balanced. Not worrying about these stations during ordinary days can allow the balancing efforts to focus on a fewer stations and therefore potentially improve the efficiency of the balancing optimization models.