Edan Habler

CR
h-index73
7papers
192citations
Novelty45%
AI Score41

7 Papers

CRJan 14, 2025Code
Tag&Tab: Pretraining Data Detection in Large Language Models Using Keyword-Based Membership Inference Attack

Sagiv Antebi, Edan Habler, Asaf Shabtai et al.

Large language models (LLMs) have become essential tools for digital task assistance. Their training relies heavily on the collection of vast amounts of data, which may include copyright-protected or sensitive information. Recent studies on detecting pretraining data in LLMs have primarily focused on sentence- or paragraph-level membership inference attacks (MIAs), usually involving probability analysis of the target model's predicted tokens. However, these methods often exhibit poor accuracy, failing to account for the semantic importance of textual content and word significance. To address these shortcomings, we propose Tag&Tab, a novel approach for detecting data used in LLM pretraining. Our method leverages established natural language processing (NLP) techniques to tag keywords in the input text, a process we term Tagging. Then, the LLM is used to obtain probabilities for these keywords and calculate their average log-likelihood to determine input text membership, a process we refer to as Tabbing. Our experiments on four benchmark datasets (BookMIA, MIMIR, PatentMIA, and the Pile) and several open-source LLMs of varying sizes demonstrate an average increase in AUC scores ranging from 5.3% to 17.6% over state-of-the-art methods. Tag&Tab not only sets a new standard for data leakage detection in LLMs, but its outstanding performance is a testament to the importance of words in MIAs on LLMs.

CLJun 17, 2025Code
LexiMark: Robust Watermarking via Lexical Substitutions to Enhance Membership Verification of an LLM's Textual Training Data

Eyal German, Sagiv Antebi, Edan Habler et al.

Large language models (LLMs) can be trained or fine-tuned on data obtained without the owner's consent. Verifying whether a specific LLM was trained on particular data instances or an entire dataset is extremely challenging. Dataset watermarking addresses this by embedding identifiable modifications in training data to detect unauthorized use. However, existing methods often lack stealth, making them relatively easy to detect and remove. In light of these limitations, we propose LexiMark, a novel watermarking technique designed for text and documents, which embeds synonym substitutions for carefully selected high-entropy words. Our method aims to enhance an LLM's memorization capabilities on the watermarked text without altering the semantic integrity of the text. As a result, the watermark is difficult to detect, blending seamlessly into the text with no visible markers, and is resistant to removal due to its subtle, contextually appropriate substitutions that evade automated and manual detection. We evaluated our method using baseline datasets from recent studies and seven open-source models: LLaMA-1 7B, LLaMA-3 8B, Mistral 7B, Pythia 6.9B, as well as three smaller variants from the Pythia family (160M, 410M, and 1B). Our evaluation spans multiple training settings, including continued pretraining and fine-tuning scenarios. The results demonstrate significant improvements in AUROC scores compared to existing methods, underscoring our method's effectiveness in reliably verifying whether unauthorized watermarked data was used in LLM training.

CRJan 17, 2024
GPT in Sheep's Clothing: The Risk of Customized GPTs

Sagiv Antebi, Noam Azulay, Edan Habler et al.

In November 2023, OpenAI introduced a new service allowing users to create custom versions of ChatGPT (GPTs) by using specific instructions and knowledge to guide the model's behavior. We aim to raise awareness of the fact that GPTs can be used maliciously, posing privacy and security risks to their users.

CRJun 8, 2025
Mind the Web: The Security of Web Use Agents

Avishag Shapira, Parth Atulbhai Gandhi, Edan Habler et al.

Web-use agents are rapidly being deployed to automate complex web tasks with extensive browser capabilities. However, these capabilities create a critical and previously unexplored attack surface. This paper demonstrates how attackers can exploit web-use agents by embedding malicious content in web pages, such as comments, reviews, or advertisements, that agents encounter during legitimate browsing tasks. We introduce the task-aligned injection technique that frames malicious commands as helpful task guidance rather than obvious attacks, exploiting fundamental limitations in LLMs' contextual reasoning. Agents struggle to maintain coherent contextual awareness and fail to detect when seemingly helpful web content contains steering attempts that deviate them from their original task goal. To scale this attack, we developed an automated three-stage pipeline that generates effective injections without manual annotation or costly online agent interactions during training, remaining efficient even with limited training data. This pipeline produces a generator model that we evaluate on five popular agents using payloads organized by the Confidentiality-Integrity-Availability (CIA) security triad, including unauthorized camera activation, file exfiltration, user impersonation, phishing, and denial-of-service. This generator achieves over 80% attack success rate (ASR) with strong transferability across unseen payloads, diverse web environments, and different underlying LLMs. This attack succeed even against agents with built-in safety mechanisms, requiring only the ability to post content on public websites. To address this risk, we propose comprehensive mitigation strategies including oversight mechanisms, execution constraints, and task-aware reasoning techniques.

CRJan 16, 2022
Adversarial Machine Learning Threat Analysis and Remediation in Open Radio Access Network (O-RAN)

Edan Habler, Ron Bitton, Dan Avraham et al.

O-RAN is a new, open, adaptive, and intelligent RAN architecture. Motivated by the success of artificial intelligence in other domains, O-RAN strives to leverage machine learning (ML) to automatically and efficiently manage network resources in diverse use cases such as traffic steering, quality of experience prediction, and anomaly detection. Unfortunately, it has been shown that ML-based systems are vulnerable to an attack technique referred to as adversarial machine learning (AML). This special kind of attack has already been demonstrated in recent studies and in multiple domains. In this paper, we present a systematic AML threat analysis for O-RAN. We start by reviewing relevant ML use cases and analyzing the different ML workflow deployment scenarios in O-RAN. Then, we define the threat model, identifying potential adversaries, enumerating their adversarial capabilities, and analyzing their main goals. Next, we explore the various AML threats associated with O-RAN and review a large number of attacks that can be performed to realize these threats and demonstrate an AML attack on a traffic steering model. In addition, we analyze and propose various AML countermeasures for mitigating the identified threats. Finally, based on the identified AML threats and countermeasures, we present a methodology and a tool for performing risk assessment for AML attacks for a specific ML use case in O-RAN.

CRJun 19, 2019
VizADS-B: Analyzing Sequences of ADS-B Images Using Explainable Convolutional LSTM Encoder-Decoder to Detect Cyber Attacks

Sefi Akerman, Edan Habler, Asaf Shabtai

The purpose of the automatic dependent surveillance broadcast (ADS-B) technology is to serve as a replacement for the current radar-based, air traffic control systems. Despite the considerable time and resources devoted to designing and developing the system, the ADS-B is well known for its lack of security mechanisms. Attempts to address these security vulnerabilities have been made in previous studies by modifying the protocol's current architecture or by using additional hardware components. These solutions, however, are considered impractical because of 1) the complex regulatory process involving avionic systems, 2) the high costs of using hardware components, and 3) the fact that the ADS-B system itself is already deployed in most aircraft and ground stations around the world. In this paper, we propose VizADS-B, an alternative software-based security solution for detecting anomalous ADS-B messages, which does not require any alteration of the current ADS-B architecture or the addition of sensors. According to the proposed method, the information obtained from all aircraft within a specific geographical area is aggregated and represented as a stream of images. Then, a convolutional LSTM encoder-decoder model is used for analyzing and detecting anomalies in the sequences of images. In addition, we propose an explainability technique, designed specifically for convolutional LSTM encoder-decoder models, which is used for providing operative information to the pilot as a visual indicator of a detected anomaly, thus allowing the pilot to make relevant decisions. We evaluated our proposed method on five datasets by injecting and subsequently identifying five different attacks. Our experiments demonstrate that most of the attacks can be detected based on spatio-temporal anomaly detection approach.

CRNov 28, 2017
Using LSTM Encoder-Decoder Algorithm for Detecting Anomalous ADS-B Messages

Edan Habler, Asaf Shabtai

Although the ADS-B system is going to play a major role in the safe navigation of airplanes and air traffic control (ATC) management, it is also well known for its lack of security mechanisms. Previous research has proposed various methods for improving the security of the ADS-B system and mitigating associated risks. However, these solutions typically require the use of additional participating nodes (or sensors) (e.g., to verify the location of the airplane by analyzing the physical signal) or modification of the current protocol architecture (e.g., adding encryption or authentication mechanisms.) Due to the regulation process regarding avionic systems and the fact that the ADS-B system is already deployed in most airplanes, applying such modifications to the current protocol at this stage is impractical. In this paper we propose an alternative security solution for detecting anomalous ADS-B messages aimed at the detection of spoofed or manipulated ADS- B messages sent by an attacker or compromised airplane. The proposed approach utilizes an LSTM encoder-decoder algorithm for modeling flight routes by analyzing sequences of legitimate ADS-B messages. Using these models, aircraft can autonomously evaluate received ADS-B messages and identify deviations from the legitimate flight path (i.e., anomalies). We examined our approach on six different flight route datasets to which we injected different types of anomalies. Using our approach we were able to detect all of the injected attacks with an average false alarm rate of 4.3% for all of datasets.