CRJul 19, 2023
Abusing Images and Sounds for Indirect Instruction Injection in Multi-Modal LLMsEugene Bagdasaryan, Tsung-Yin Hsieh, Ben Nassi et al.
We demonstrate how images and sounds can be used for indirect prompt and instruction injection in multi-modal LLMs. An attacker generates an adversarial perturbation corresponding to the prompt and blends it into an image or audio recording. When the user asks the (unmodified, benign) model about the perturbed image or audio, the perturbation steers the model to output the attacker-chosen text and/or make the subsequent dialog follow the attacker's instruction. We illustrate this attack with several proof-of-concept examples targeting LLaVa and PandaGPT.
CRSep 5, 2023
The Adversarial Implications of Variable-Time InferenceDudi Biton, Aditi Misra, Efrat Levy et al.
Machine learning (ML) models are known to be vulnerable to a number of attacks that target the integrity of their predictions or the privacy of their training data. To carry out these attacks, a black-box adversary must typically possess the ability to query the model and observe its outputs (e.g., labels). In this work, we demonstrate, for the first time, the ability to enhance such decision-based attacks. To accomplish this, we present an approach that exploits a novel side channel in which the adversary simply measures the execution time of the algorithm used to post-process the predictions of the ML model under attack. The leakage of inference-state elements into algorithmic timing side channels has never been studied before, and we have found that it can contain rich information that facilitates superior timing attacks that significantly outperform attacks based solely on label outputs. In a case study, we investigate leakage from the non-maximum suppression (NMS) algorithm, which plays a crucial role in the operation of object detectors. In our examination of the timing side-channel vulnerabilities associated with this algorithm, we identified the potential to enhance decision-based attacks. We demonstrate attacks against the YOLOv3 detector, leveraging the timing leakage to successfully evade object detection using adversarial examples, and perform dataset inference. Our experiments show that our adversarial examples exhibit superior perturbation quality compared to a decision-based attack. In addition, we present a new threat model in which dataset inference based solely on timing leakage is performed. To address the timing leakage vulnerability inherent in the NMS algorithm, we explore the potential and limitations of implementing constant-time inference passes as a mitigation strategy.
CVNov 24, 2022
Seeds Don't Lie: An Adaptive Watermarking Framework for Computer Vision ModelsJacob Shams, Ben Nassi, Ikuya Morikawa et al.
In recent years, various watermarking methods were suggested to detect computer vision models obtained illegitimately from their owners, however they fail to demonstrate satisfactory robustness against model extraction attacks. In this paper, we present an adaptive framework to watermark a protected model, leveraging the unique behavior present in the model due to a unique random seed initialized during the model training. This watermark is used to detect extracted models, which have the same unique behavior, indicating an unauthorized usage of the protected model's intellectual property (IP). First, we show how an initial seed for random number generation as part of model training produces distinct characteristics in the model's decision boundaries, which are inherited by extracted models and present in their decision boundaries, but aren't present in non-extracted models trained on the same data-set with a different seed. Based on our findings, we suggest the Robust Adaptive Watermarking (RAW) Framework, which utilizes the unique behavior present in the protected and extracted models to generate a watermark key-set and verification model. We show that the framework is robust to (1) unseen model extraction attacks, and (2) extracted models which undergo a blurring method (e.g., weight pruning). We evaluate the framework's robustness against a naive attacker (unaware that the model is watermarked), and an informed attacker (who employs blurring strategies to remove watermarked behavior from an extracted model), and achieve outstanding (i.e., >0.9) AUC values. Finally, we show that the framework is robust to model extraction attacks with different structure and/or architecture than the protected model.
LGMay 13, 2022
EyeDAS: Securing Perception of Autonomous Cars Against the Stereoblindness SyndromeEfrat Levy, Ben Nassi, Raz Swissa et al.
The ability to detect whether an object is a 2D or 3D object is extremely important in autonomous driving, since a detection error can have life-threatening consequences, endangering the safety of the driver, passengers, pedestrians, and others on the road. Methods proposed to distinguish between 2 and 3D objects (e.g., liveness detection methods) are not suitable for autonomous driving, because they are object dependent or do not consider the constraints associated with autonomous driving (e.g., the need for real-time decision-making while the vehicle is moving). In this paper, we present EyeDAS, a novel few-shot learning-based method aimed at securing an object detector (OD) against the threat posed by the stereoblindness syndrome (i.e., the inability to distinguish between 2D and 3D objects). We evaluate EyeDAS's real-time performance using 2,000 objects extracted from seven YouTube video recordings of street views taken by a dash cam from the driver's seat perspective. When applying EyeDAS to seven state-of-the-art ODs as a countermeasure, EyeDAS was able to reduce the 2D misclassification rate from 71.42-100% to 2.4% with a 3D misclassification rate of 0% (TPR of 1.0). We also show that EyeDAS outperforms the baseline method and achieves an AUC of over 0.999 and a TPR of 1.0 with an FPR of 0.024.
CRSep 12, 2024
Unleashing Worms and Extracting Data: Escalating the Outcome of Attacks against RAG-based Inference in Scale and Severity Using JailbreakingStav Cohen, Ron Bitton, Ben Nassi
In this paper, we show that with the ability to jailbreak a GenAI model, attackers can escalate the outcome of attacks against RAG-based GenAI-powered applications in severity and scale. In the first part of the paper, we show that attackers can escalate RAG membership inference attacks and RAG entity extraction attacks to RAG documents extraction attacks, forcing a more severe outcome compared to existing attacks. We evaluate the results obtained from three extraction methods, the influence of the type and the size of five embeddings algorithms employed, the size of the provided context, and the GenAI engine. We show that attackers can extract 80%-99.8% of the data stored in the database used by the RAG of a Q&A chatbot. In the second part of the paper, we show that attackers can escalate the scale of RAG data poisoning attacks from compromising a single GenAI-powered application to compromising the entire GenAI ecosystem, forcing a greater scale of damage. This is done by crafting an adversarial self-replicating prompt that triggers a chain reaction of a computer worm within the ecosystem and forces each affected application to perform a malicious activity and compromise the RAG of additional applications. We evaluate the performance of the worm in creating a chain of confidential data extraction about users within a GenAI ecosystem of GenAI-powered email assistants and analyze how the performance of the worm is affected by the size of the context, the adversarial self-replicating prompt used, the type and size of the embeddings algorithm employed, and the number of hops in the propagation. Finally, we review and analyze guardrails to protect RAG-based inference and discuss the tradeoffs.
CRAug 9, 2024
A Jailbroken GenAI Model Can Cause Substantial Harm: GenAI-powered Applications are Vulnerable to PromptWaresStav Cohen, Ron Bitton, Ben Nassi
In this paper we argue that a jailbroken GenAI model can cause substantial harm to GenAI-powered applications and facilitate PromptWare, a new type of attack that flips the GenAI model's behavior from serving an application to attacking it. PromptWare exploits user inputs to jailbreak a GenAI model to force/perform malicious activity within the context of a GenAI-powered application. First, we introduce a naive implementation of PromptWare that behaves as malware that targets Plan & Execute architectures (a.k.a., ReAct, function calling). We show that attackers could force a desired execution flow by creating a user input that produces desired outputs given that the logic of the GenAI-powered application is known to attackers. We demonstrate the application of a DoS attack that triggers the execution of a GenAI-powered assistant to enter an infinite loop that wastes money and computational resources on redundant API calls to a GenAI engine, preventing the application from providing service to a user. Next, we introduce a more sophisticated implementation of PromptWare that we name Advanced PromptWare Threat (APwT) that targets GenAI-powered applications whose logic is unknown to attackers. We show that attackers could create user input that exploits the GenAI engine's advanced AI capabilities to launch a kill chain in inference time consisting of six steps intended to escalate privileges, analyze the application's context, identify valuable assets, reason possible malicious activities, decide on one of them, and execute it. We demonstrate the application of APwT against a GenAI-powered e-commerce chatbot and show that it can trigger the modification of SQL tables, potentially leading to unauthorized discounts on the items sold to the user.
CRJan 14
The Promptware Kill Chain: How Prompt Injections Gradually Evolved Into a Multi-Step MalwareBen Nassi, Bruce Schneier, Oleg Brodt
The rapid adoption of large language model (LLM)-based systems -- from chatbots to autonomous agents capable of executing code and financial transactions -- has created a new attack surface that existing security frameworks inadequately address. The dominant framing of these threats as "prompt injection" -- a catch-all phrase for security failures in LLM-based systems -- obscures a more complex reality: Attacks on LLM-based systems increasingly involve multi-step sequences that mirror traditional malware campaigns. In this paper, we propose that attacks targeting LLM-based applications constitute a distinct class of malware, which we term \textit{promptware}, and introduce a five-step kill chain model for analyzing these threats. The framework comprises Initial Access (prompt injection), Privilege Escalation (jailbreaking), Persistence (memory and retrieval poisoning), Lateral Movement (cross-system and cross-user propagation), and Actions on Objective (ranging from data exfiltration to unauthorized transactions). By mapping recent attacks to this structure, we demonstrate that LLM-related attacks follow systematic sequences analogous to traditional malware campaigns. The promptware kill chain offers security practitioners a structured methodology for threat modeling and provides a common vocabulary for researchers across AI safety and cybersecurity to address a rapidly evolving threat landscape.
CRDec 24, 2024
SoK: On the Offensive Potential of AISaskia Laura Schröer, Giovanni Apruzzese, Soheil Human et al.
Our society increasingly benefits from Artificial Intelligence (AI). Unfortunately, more and more evidence shows that AI is also used for offensive purposes. Prior works have revealed various examples of use cases in which the deployment of AI can lead to violation of security and privacy objectives. No extant work, however, has been able to draw a holistic picture of the offensive potential of AI. In this SoK paper we seek to lay the ground for a systematic analysis of the heterogeneous capabilities of offensive AI. In particular we (i) account for AI risks to both humans and systems while (ii) consolidating and distilling knowledge from academic literature, expert opinions, industrial venues, as well as laypeople -- all of which being valuable sources of information on offensive AI. To enable alignment of such diverse sources of knowledge, we devise a common set of criteria reflecting essential technological factors related to offensive AI. With the help of such criteria, we systematically analyze: 95 research papers; 38 InfoSec briefings (from, e.g., BlackHat); the responses of a user study (N=549) entailing individuals with diverse backgrounds and expertise; and the opinion of 12 experts. Our contributions not only reveal concerning ways (some of which overlooked by prior work) in which AI can be offensively used today, but also represent a foothold to address this threat in the years to come.
CVMay 8, 2025
PaniCar: Securing the Perception of Advanced Driving Assistance Systems Against Emergency Vehicle LightingElad Feldman, Jacob Shams, Dudi Biton et al.
The safety of autonomous cars has come under scrutiny in recent years, especially after 16 documented incidents involving Teslas (with autopilot engaged) crashing into parked emergency vehicles (police cars, ambulances, and firetrucks). While previous studies have revealed that strong light sources often introduce flare artifacts in the captured image, which degrade the image quality, the impact of flare on object detection performance remains unclear. In this research, we unveil PaniCar, a digital phenomenon that causes an object detector's confidence score to fluctuate below detection thresholds when exposed to activated emergency vehicle lighting. This vulnerability poses a significant safety risk, and can cause autonomous vehicles to fail to detect objects near emergency vehicles. In addition, this vulnerability could be exploited by adversaries to compromise the security of advanced driving assistance systems (ADASs). We assess seven commercial ADASs (Tesla Model 3, "manufacturer C", HP, Pelsee, AZDOME, Imagebon, Rexing), four object detectors (YOLO, SSD, RetinaNet, Faster R-CNN), and 14 patterns of emergency vehicle lighting to understand the influence of various technical and environmental factors. We also evaluate four SOTA flare removal methods and show that their performance and latency are insufficient for real-time driving constraints. To mitigate this risk, we propose Caracetamol, a robust framework designed to enhance the resilience of object detectors against the effects of activated emergency vehicle lighting. Our evaluation shows that on YOLOv3 and Faster RCNN, Caracetamol improves the models' average confidence of car detection by 0.20, the lower confidence bound by 0.33, and reduces the fluctuation range by 0.33. In addition, Caracetamol is capable of processing frames at a rate of between 30-50 FPS, enabling real-time ADAS car detection.
CVJan 26, 2025
A Privacy Enhancing Technique to Evade Detection by Street Video Cameras Without Using Adversarial AccessoriesJacob Shams, Ben Nassi, Satoru Koda et al.
In this paper, we propose a privacy-enhancing technique leveraging an inherent property of automatic pedestrian detection algorithms, namely, that the training of deep neural network (DNN) based methods is generally performed using curated datasets and laboratory settings, while the operational areas of these methods are dynamic real-world environments. In particular, we leverage a novel side effect of this gap between the laboratory and the real world: location-based weakness in pedestrian detection. We demonstrate that the position (distance, angle, height) of a person, and ambient light level, directly impact the confidence of a pedestrian detector when detecting the person. We then demonstrate that this phenomenon is present in pedestrian detectors observing a stationary scene of pedestrian traffic, with blind spot areas of weak detection of pedestrians with low confidence. We show how privacy-concerned pedestrians can leverage these blind spots to evade detection by constructing a minimum confidence path between two points in a scene, reducing the maximum confidence and average confidence of the path by up to 0.09 and 0.13, respectively, over direct and random paths through the scene. To counter this phenomenon, and force the use of more costly and sophisticated methods to leverage this vulnerability, we propose a novel countermeasure to improve the confidence of pedestrian detectors in blind spots, raising the max/average confidence of paths generated by our technique by 0.09 and 0.05, respectively. In addition, we demonstrate that our countermeasure improves a Faster R-CNN-based pedestrian detector's TPR and average true positive confidence by 0.03 and 0.15, respectively.
CVJan 14, 2025
Towards an End-to-End (E2E) Adversarial Learning and Application in the Physical WorldDudi Biton, Jacob Shams, Satoru Koda et al.
The traditional learning process of patch-based adversarial attacks, conducted in the digital domain and then applied in the physical domain (e.g., via printed stickers), may suffer from reduced performance due to adversarial patches' limited transferability from the digital domain to the physical domain. Given that previous studies have considered using projectors to apply adversarial attacks, we raise the following question: can adversarial learning (i.e., patch generation) be performed entirely in the physical domain with a projector? In this work, we propose the Physical-domain Adversarial Patch Learning Augmentation (PAPLA) framework, a novel end-to-end (E2E) framework that converts adversarial learning from the digital domain to the physical domain using a projector. We evaluate PAPLA across multiple scenarios, including controlled laboratory settings and realistic outdoor environments, demonstrating its ability to ensure attack success compared to conventional digital learning-physical application (DL-PA) methods. We also analyze the impact of environmental factors, such as projection surface color, projector strength, ambient light, distance, and angle of the target object relative to the camera, on the effectiveness of projected patches. Finally, we demonstrate the feasibility of the attack against a parked car and a stop sign in a real-world outdoor environment. Our results show that under specific conditions, E2E adversarial learning in the physical domain eliminates the transferability issue and ensures evasion by object detectors. Finally, we provide insights into the challenges and opportunities of applying adversarial learning in the physical domain and explain where such an approach is more effective than using a sticker.
CRFeb 21, 2022
bAdvertisement: Attacking Advanced Driver-Assistance Systems Using Print AdvertisementsBen Nassi, Jacob Shams, Raz Ben Netanel et al.
In this paper, we present bAdvertisement, a novel attack method against advanced driver-assistance systems (ADASs). bAdvertisement is performed as a supply chain attack via a compromised computer in a printing house, by embedding a "phantom" object in a print advertisement. When the compromised print advertisement is observed by an ADAS in a passing car, an undesired reaction is triggered from the ADAS. We analyze state-of-the-art object detectors and show that they do not take color or context into account in object detection. Our validation of these findings on Mobileye 630 PRO shows that this ADAS also fails to take color or context into account. Then, we show how an attacker can take advantage of these findings to execute an attack on a commercial ADAS, by embedding a phantom road sign in a print advertisement, which causes a car equipped with Mobileye 630 PRO to trigger a false notification to slow down. Finally, we discuss multiple countermeasures which can be deployed in order to mitigate the effect of our proposed attack.
CRJan 2, 2022
VISAS -- Detecting GPS spoofing attacks against drones by analyzing camera's video streamBarak Davidovich, Ben Nassi, Yuval Elovici
In this study, we propose an innovative method for the real-time detection of GPS spoofing attacks targeting drones, based on the video stream captured by a drone's camera. The proposed method collects frames from the video stream and their location (GPS); by calculating the correlation between each frame, our method can identify an attack on a drone. We first analyze the performance of the suggested method in a controlled environment by conducting experiments on a flight simulator that we developed. Then, we analyze its performance in the real world using a DJI drone. Our method can provide different levels of security against GPS spoofing attacks, depending on the detection interval required; for example, it can provide a high level of security to a drone flying at an altitude of 50-100 meters over an urban area at an average speed of 4 km/h in conditions of low ambient light; in this scenario, the method can provide a level of security that detects any GPS spoofing attack in which the spoofed location is a distance of 1-4 meters (an average of 2.5 meters) from the real location.
CRJun 24, 2019
MobilBye: Attacking ADAS with Camera SpoofingDudi Nassi, Raz Ben-Netanel, Yuval Elovici et al.
Advanced driver assistance systems (ADASs) were developed to reduce the number of car accidents by issuing driver alert or controlling the vehicle. In this paper, we tested the robustness of Mobileye, a popular external ADAS. We injected spoofed traffic signs into Mobileye to assess the influence of environmental changes (e.g., changes in color, shape, projection speed, diameter and ambient light) on the outcome of an attack. To conduct this experiment in a realistic scenario, we used a drone to carry a portable projector which projected the spoofed traffic sign on a driving car. Our experiments show that it is possible to fool Mobileye so that it interprets the drone carried spoofed traffic sign as a real traffic sign.
CRMar 12, 2019
SoK - Security and Privacy in the Age of Drones: Threats, Challenges, Solution Mechanisms, and Scientific GapsBen Nassi, Asaf Shabtai, Ryusuke Masuoka et al.
The evolution of drone technology in the past nine years since the first commercial drone was introduced at CES 2010 has caused many individuals and businesses to adopt drones for various purposes. We are currently living in an era in which drones are being used for pizza delivery, the shipment of goods, and filming, and they are likely to provide an alternative for transportation in the near future. However, drones also pose a significant challenge in terms of security and privacy within society (for both individuals and organizations), and many drone related incidents are reported on a daily basis. These incidents have called attention to the need to detect and disable drones used for malicious purposes and opened up a new area of research and development for academia and industry, with a market that is expected to reach $1.85 billion by 2024. While some of the knowledge used to detect UAVs has been adopted for drone detection, new methods have been suggested by industry and academia alike to deal with the challenges associated with detecting the very small and fast flying objects. In this paper, we describe new societal threats to security and privacy created by drones, and present academic and industrial methods used to detect and disable drones. We review methods targeted at areas that restrict drone flights and analyze their effectiveness with regard to various factors (e.g., weather, birds, ambient light, etc.). We present the challenges arising in areas that allow drone flights, introduce the methods that exist for dealing with these challenges, and discuss the scientific gaps that exist in this area. Finally, we review methods used to disable drones, analyze their effectiveness, and present their expected results. Finally, we suggest future research directions.
CRAug 6, 2018
Piping Botnet - Turning Green Technology into a Water DisasterBen Nassi, Moshe Sror, Ido Lavi et al.
The current generation of IoT devices is being used by clients and consumers to regulate resources (such as water and electricity) obtained from critical infrastructure (such as urban water services and smart grids), creating a new attack vector against critical infrastructure. In this research we show that smart irrigation systems, a new type of green technology and IoT device aimed at saving water and money, can be used by attackers as a means of attacking urban water services. We present a distributed attack model that can be used by an attacker to attack urban water services using a botnet of commercial smart irrigation systems. Then, we show how a bot running on a compromised device in a LAN can:(1) detect a connected commercial smart irrigation system (RainMachine, BlueSpray, and GreenIQ) within 15 minutes by analyzing LAN's behavior using a dedicated classification model, and (2) launch watering via a commercial smart irrigation system according to an attacker's wishes using spoofing and replay attacks. In addition, we model the damage that can be caused by performing such an attack and show that a standard water tower can be emptied in an hour using a botnet of 1,355 sprinklers and a flood water reservoir can be emptied overnight using a botnet of 23,866 sprinklers. Finally, we discuss countermeasure methods and hypothesize whether the next generation of plumbers will use Kali Linux instead of a monkey wrench.
CRJan 9, 2018
Game of Drones - Detecting Streamed POI from Encrypted FPV ChannelBen Nassi, Raz Ben-Netanel, Adi Shamir et al.
Drones have created a new threat to people's privacy. We are now in an era in which anyone with a drone equipped with a video camera can use it to invade a subject's privacy by streaming the subject in his/her private space over an encrypted first person view (FPV) channel. Although many methods have been suggested to detect nearby drones, they all suffer from the same shortcoming: they cannot identify exactly what is being captured, and therefore they fail to distinguish between the legitimate use of a drone (for example, to use a drone to film a selfie from the air) and illegitimate use that invades someone's privacy (when the same operator uses the drone to stream the view into the window of his neighbor's apartment), a distinction that in some cases depends on the orientation of the drone's video camera rather than on the drone's location. In this paper we shatter the commonly held belief that the use of encryption to secure an FPV channel prevents an interceptor from extracting the POI that is being streamed. We show methods that leverage physical stimuli to detect whether the drone's camera is directed towards a target in real time. We investigate the influence of changing pixels on the FPV channel (in a lab setup). Based on our observations we demonstrate how an interceptor can perform a side-channel attack to detect whether a target is being streamed by analyzing the encrypted FPV channel that is transmitted from a real drone (DJI Mavic) in two use cases: when the target is a private house and when the target is a subject.
CRMar 22, 2017
Oops!...I think I scanned a malwareBen Nassi, Adi Shamir, Yuval Elovici
This article presents a proof-of-concept illustrating the feasibility of creating a covert channel between a C\&C server and a malware installed in an organization by exploiting an organization's scanner and using it as a means of interaction. We take advantage of the light sensitivity of a flatbed scanner, using a light source to infiltrate data to an organization. We present an implementation of the method for different purposes (even to trigger a ransomware attack) in various experimental setups using: (1) a laser connected to a stand (2) a laser carried by a drone, and (3) a hijacked smart bulb within the targeted organization from a passing car. In our experiments we were able to infiltrate data using different types of light sources (including infrared light), from a distance of up to 900 meters away from the scanner. We discuss potential counter measures to prevent the attack.
CRDec 19, 2016
Handwritten Signature Verification Using Hand-Worn DevicesBen Nassi, Alona Levy, Yuval Elovici et al.
Online signature verification technologies, such as those available in banks and post offices, rely on dedicated digital devices such as tablets or smart pens to capture, analyze and verify signatures. In this paper, we suggest a novel method for online signature verification that relies on the increasingly available hand-worn devices, such as smartwatches or fitness trackers, instead of dedicated ad-hoc devices. Our method uses a set of known genuine and forged signatures, recorded using the motion sensors of a hand-worn device, to train a machine learning classifier. Then, given the recording of an unknown signature and a claimed identity, the classifier can determine whether the signature is genuine or forged. In order to validate our method, it was applied on 1980 recordings of genuine and forged signatures that we collected from 66 subjects in our institution. Using our method, we were able to successfully distinguish between genuine and forged signatures with a high degree of accuracy (0.98 AUC and 0.05 EER).
HCDec 14, 2016
Virtual BreathalyzerBen Nassi, Lior Rokach, Yuval Elovici
Driving under the influence of alcohol is a widespread phenomenon in the US where it is considered a major cause of fatal accidents. In this research we present a novel approach and concept for detecting intoxication from motion differences obtained by the sensors of wearable devices. We formalize the problem of drunkenness detection as a supervised machine learning task, both as a binary classification problem (drunk or sober) and a regression problem (the breath alcohol content level). In order to test our approach, we collected data from 30 different subjects (patrons at three bars) using Google Glass and the LG G-watch, Microsoft Band, and Samsung Galaxy S4. We validated our results against an admissible breathalyzer used by the police. A system based on this concept, successfully detected intoxication and achieved the following results: 0.95 AUC and 0.05 FPR, given a fixed TPR of 1.0. Applications based on our system can be used to analyze the free gait of drinkers when they walk from the car to the bar and vice-versa, in order to alert people, or even a connected car and prevent people from driving under the influence of alcohol.