LGOct 16, 2023Code
Ring-A-Bell! How Reliable are Concept Removal Methods for Diffusion Models?Yu-Lin Tsai, Chia-Yi Hsu, Chulin Xie et al.
Diffusion models for text-to-image (T2I) synthesis, such as Stable Diffusion (SD), have recently demonstrated exceptional capabilities for generating high-quality content. However, this progress has raised several concerns of potential misuse, particularly in creating copyrighted, prohibited, and restricted content, or NSFW (not safe for work) images. While efforts have been made to mitigate such problems, either by implementing a safety filter at the evaluation stage or by fine-tuning models to eliminate undesirable concepts or styles, the effectiveness of these safety measures in dealing with a wide range of prompts remains largely unexplored. In this work, we aim to investigate these safety mechanisms by proposing one novel concept retrieval algorithm for evaluation. We introduce Ring-A-Bell, a model-agnostic red-teaming tool for T2I diffusion models, where the whole evaluation can be prepared in advance without prior knowledge of the target model. Specifically, Ring-A-Bell first performs concept extraction to obtain holistic representations for sensitive and inappropriate concepts. Subsequently, by leveraging the extracted concept, Ring-A-Bell automatically identifies problematic prompts for diffusion models with the corresponding generation of inappropriate content, allowing the user to assess the reliability of deployed safety mechanisms. Finally, we empirically validate our method by testing online services such as Midjourney and various methods of concept removal. Our results show that Ring-A-Bell, by manipulating safe prompting benchmarks, can transform prompts that were originally regarded as safe to evade existing safety mechanisms, thus revealing the defects of the so-called safety mechanisms which could practically lead to the generation of harmful contents. Our codes are available at https://github.com/chiayi-hsu/Ring-A-Bell.
CRMay 28
Harmless Yet Harmful: Neutral Prompting Attacks for Stealthy Hallucination Steering in Agent SkillsChia-Yi Hsu, Chia-Mu Yu, Chun-Ying Huang et al.
LLM-powered coding agents increasingly participate in software development workflows by generating code, selecting dependencies, and producing package installation commands. This creates a new software supply chain risk: when an agent hallucinates a non-existent package, an attacker may register the hallucinated name and later compromise users who install it. Existing package hallucination attacks and defenses primarily focus on naturally occurring hallucinations, targeted dependency steering, or post-hoc package validation. In this paper, we introduce \emph{Neutral Prompting Attack} (NPA), a highly stealthy attack paradigm in which semantically benign instructions, such as encouraging imagination and exhaustiveness, increase package hallucination propensity without containing explicit malicious intent. Unlike targeted dependency steering, NPA does not specify an attacker-chosen package. Instead, it shifts the model's dependency generation behavior toward more speculative package names. We evaluate NPA across multiple coding-oriented LLMs and package hallucination benchmarks. Our results show that NPA increases both \emph{Hallucination ASR} and \emph{Pip Install ASR}, changes the distribution of hallucinated package names, and evades existing static-analysis, LLM-based, and agent-based Skill defenses. These findings reveal that harmless-looking prompts can covertly manipulate hallucination behavior and create downstream software supply chain risks.
CVApr 20, 2023
DPAF: Image Synthesis via Differentially Private Aggregation in Forward PhaseChih-Hsun Lin, Chia-Yi Hsu, Chia-Mu Yu et al.
Differentially private synthetic data is a promising alternative for sensitive data release. Many differentially private generative models have been proposed in the literature. Unfortunately, they all suffer from the low utility of the synthetic data, particularly for images of high resolutions. Here, we propose DPAF, an effective differentially private generative model for high-dimensional image synthesis. Different from the prior private stochastic gradient descent-based methods that add Gaussian noises in the backward phase during the model training, DPAF adds a differentially private feature aggregation in the forward phase, bringing advantages, including the reduction of information loss in gradient clipping and low sensitivity for the aggregation. Moreover, as an improper batch size has an adverse impact on the utility of synthetic data, DPAF also tackles the problem of setting a proper batch size by proposing a novel training strategy that asymmetrically trains different parts of the discriminator. We extensively evaluate different methods on multiple image datasets (up to images of 128x128 resolution) to demonstrate the performance of DPAF.
CRMay 10
Trust Me, Import This: Dependency Steering Attacks via Malicious Agent SkillsYiyong Liu, Chia-Yi Hsu, Chun-Ying Huang et al.
LLM-powered coding agents increasingly make software supply chain decisions. They generate imports, recommend packages, and write installation commands. Prior work showed that these systems can hallucinate non-existent package names, which attackers may register as malicious packages. In this paper, we show that this risk is not only a passive model failure. It can be actively induced through the persistent Skill artifact. We introduce Dependency Steering, an attack paradigm in which a malicious Skill biases a coding agent toward an attacker-controlled package during benign coding tasks. The attack does not require modifying model weights, training data, or user prompts. To construct realistic attacks, we design a Skill-level optimization method that searches for localized semantic edits that preserve the apparent purpose of the original Skill while increasing targeted package generation. Across multiple coding-oriented LLMs and programming benchmarks, Dependency Steering achieves high targeted hallucination rates, transfers across models and task domains, and remains difficult for evaluated Skill scanners and LLM-based auditors to detect. Our results show that persistent agent instructions form an underexplored software supply chain attack surface.
CLNov 2, 2024
CmdCaliper: A Semantic-Aware Command-Line Embedding Model and Dataset for Security ResearchSian-Yao Huang, Cheng-Lin Yang, Che-Yu Lin et al.
This research addresses command-line embedding in cybersecurity, a field obstructed by the lack of comprehensive datasets due to privacy and regulation concerns. We propose the first dataset of similar command lines, named CyPHER, for training and unbiased evaluation. The training set is generated using a set of large language models (LLMs) comprising 28,520 similar command-line pairs. Our testing dataset consists of 2,807 similar command-line pairs sourced from authentic command-line data. In addition, we propose a command-line embedding model named CmdCaliper, enabling the computation of semantic similarity with command lines. Performance evaluations demonstrate that the smallest version of CmdCaliper (30 million parameters) suppresses state-of-the-art (SOTA) sentence embedding models with ten times more parameters across various tasks (e.g., malicious command-line detection and similar command-line retrieval). Our study explores the feasibility of data generation using LLMs in the cybersecurity domain. Furthermore, we release our proposed command-line dataset, embedding models' weights and all program codes to the public. This advancement paves the way for more effective command-line embedding for future researchers.
LGJan 4, 2025
BADTV: Unveiling Backdoor Threats in Third-Party Task VectorsChia-Yi Hsu, Yu-Lin Tsai, Yu Zhe et al.
Task arithmetic in large-scale pre-trained models enables agile adaptation to diverse downstream tasks without extensive retraining. By leveraging task vectors (TVs), users can perform modular updates through simple arithmetic operations like addition and subtraction. Yet, this flexibility presents new security challenges. In this paper, we investigate how TVs are vulnerable to backdoor attacks, revealing how malicious actors can exploit them to compromise model integrity. By creating composite backdoors that are designed asymmetrically, we introduce BadTV, a backdoor attack specifically crafted to remain effective simultaneously under task learning, forgetting, and analogy operations. Extensive experiments show that BadTV achieves near-perfect attack success rates across diverse scenarios, posing a serious threat to models relying on task arithmetic. We also evaluate current defenses, finding they fail to detect or mitigate BadTV. Our results highlight the urgent need for robust countermeasures to secure TVs in real-world deployments.
CRMar 5, 2025
Data Poisoning Attacks to Locally Differentially Private Range Query ProtocolsTing-Wei Liao, Chih-Hsun Lin, Yu-Lin Tsai et al.
Local Differential Privacy (LDP) has been widely adopted to protect user privacy in decentralized data collection. However, recent studies have revealed that LDP protocols are vulnerable to data poisoning attacks, where malicious users manipulate their reported data to distort aggregated results. In this work, we present the first study on data poisoning attacks targeting LDP range query protocols, focusing on both tree-based and grid-based approaches. We identify three key challenges in executing such attacks, including crafting consistent and effective fake data, maintaining data consistency across levels or grids, and preventing server detection. To address the first two challenges, we propose novel attack methods that are provably optimal, including a tree-based attack and a grid-based attack, designed to manipulate range query results with high effectiveness. \textbf{Our key finding is that the common post-processing procedure, Norm-Sub, in LDP range query protocols can help the attacker massively amplify their attack effectiveness.} In addition, we study a potential countermeasure, but also propose an adaptive attack capable of evading this defense to address the third challenge. We evaluate our methods through theoretical analysis and extensive experiments on synthetic and real-world datasets. Our results show that the proposed attacks can significantly amplify estimations for arbitrary range queries by manipulating a small fraction of users, providing 5-10x more influence than a normal user to the estimation.
CVMar 20, 2025
VP-NTK: Exploring the Benefits of Visual Prompting in Differentially Private Data SynthesisChia-Yi Hsu, Jia-You Chen, Yu-Lin Tsai et al.
Differentially private (DP) synthetic data has become the de facto standard for releasing sensitive data. However, many DP generative models suffer from the low utility of synthetic data, especially for high-resolution images. On the other hand, one of the emerging techniques in parameter efficient fine-tuning (PEFT) is visual prompting (VP), which allows well-trained existing models to be reused for the purpose of adapting to subsequent downstream tasks. In this work, we explore such a phenomenon in constructing captivating generative models with DP constraints. We show that VP in conjunction with DP-NTK, a DP generator that exploits the power of the neural tangent kernel (NTK) in training DP generative models, achieves a significant performance boost, particularly for high-resolution image datasets, with accuracy improving from 0.644$\pm$0.044 to 0.769. Lastly, we perform ablation studies on the effect of different parameters that influence the overall performance of VP-NTK. Our work demonstrates a promising step forward in improving the utility of DP synthetic data, particularly for high-resolution images.
CRMar 6, 2025
Poisoning Attacks to Local Differential Privacy Protocols for Trajectory DataI-Jung Hsu, Chih-Hsun Lin, Chia-Mu Yu et al.
Trajectory data, which tracks movements through geographic locations, is crucial for improving real-world applications. However, collecting such sensitive data raises considerable privacy concerns. Local differential privacy (LDP) offers a solution by allowing individuals to locally perturb their trajectory data before sharing it. Despite its privacy benefits, LDP protocols are vulnerable to data poisoning attacks, where attackers inject fake data to manipulate aggregated results. In this work, we make the first attempt to analyze vulnerabilities in several representative LDP trajectory protocols. We propose \textsc{TraP}, a heuristic algorithm for data \underline{P}oisoning attacks using a prefix-suffix method to optimize fake \underline{Tra}jectory selection, significantly reducing computational complexity. Our experimental results demonstrate that our attack can substantially increase target pattern occurrences in the perturbed trajectory dataset with few fake users. This study underscores the urgent need for robust defenses and better protocol designs to safeguard LDP trajectory data against malicious manipulation.
CLFeb 27, 2025
Layer-Aware Task Arithmetic: Disentangling Task-Specific and Instruction-Following KnowledgeYan-Lun Chen, Yi-Ru Wei, Chia-Yi Hsu et al.
Large language models (LLMs) demonstrate strong task-specific capabilities through fine-tuning, but merging multiple fine-tuned models often leads to degraded performance due to overlapping instruction-following components. Task Arithmetic (TA), which combines task vectors derived from fine-tuning, enables multi-task learning and task forgetting but struggles to isolate task-specific knowledge from general instruction-following behavior. To address this, we propose Layer-Aware Task Arithmetic (LATA), a novel approach that assigns layer-specific weights to task vectors based on their alignment with instruction-following or task-specific components. By amplifying task-relevant layers and attenuating instruction-following layers, LATA improves task learning and forgetting performance while preserving overall model utility. Experiments on multiple benchmarks, including WikiText-2, GSM8K, and HumanEval, demonstrate that LATA outperforms existing methods in both multi-task learning and selective task forgetting, achieving higher task accuracy and alignment with minimal degradation in output quality. Our findings highlight the importance of layer-wise analysis in disentangling task-specific and general-purpose knowledge, offering a robust framework for efficient model merging and editing.
CYNov 9, 2020
An Empirical Evaluation of Bluetooth-based Decentralized Contact Tracing in CrowdsHsu-Chun Hsiao, Chun-Ying Huang, Shin-Ming Cheng et al.
Digital contact tracing is being used by many countries to help contain COVID-19's spread in a post-lockdown world. Among the various available techniques, decentralized contact tracing that uses Bluetooth received signal strength indication (RSSI) to detect proximity is considered less of a privacy risk than approaches that rely on collecting absolute locations via GPS, cellular-tower history, or QR-code scanning. As of October 2020, there have been millions of downloads of such Bluetooth-based contract-tracing apps, as more and more countries officially adopt them. However, the effectiveness of these apps in the real world remains unclear due to a lack of empirical research that includes realistic crowd sizes and densities. This study aims to fill that gap, by empirically investigating the effectiveness of Bluetooth-based contact tracing in crowd environments with a total of 80 participants, emulating classrooms, moving lines, and other types of real-world gatherings. The results confirm that Bluetooth RSSI is unreliable for detecting proximity, and that this inaccuracy worsens in environments that are especially crowded. In other words, this technique may be least useful when it is most in need, and that it is fragile when confronted by low-cost jamming. Moreover, technical problems such as high energy consumption and phone overheating caused by the contact-tracing app were found to negatively influence users' willingness to adopt it. On the bright side, however, Bluetooth RSSI may still be useful for detecting coarse-grained contact events, for example, proximity of up to 20m lasting for an hour. Based on our findings, we recommend that existing contact-tracing apps can be re-purposed to focus on coarse-grained proximity detection, and that future ones calibrate distance estimates and adjust broadcast frequencies based on auxiliary information.
CRMar 2, 2016
Decapitation via Digital Epidemics: A Bio-Inspired Transmissive AttackPin-Yu Chen, Ching-Chao Lin, Shin-Ming Cheng et al.
The evolution of communication technology and the proliferation of electronic devices have rendered adversaries powerful means for targeted attacks via all sorts of accessible resources. In particular, owing to the intrinsic interdependency and ubiquitous connectivity of modern communication systems, adversaries can devise malware that propagates through intermediate hosts to approach the target, which we refer to as transmissive attacks. Inspired by biology, the transmission pattern of such an attack in the digital space much resembles the spread of an epidemic in real life. This paper elaborates transmissive attacks, summarizes the utility of epidemic models in communication systems, and draws connections between transmissive attacks and epidemic models. Simulations, experiments, and ongoing research challenges on transmissive attacks are also addressed.