CRAug 27, 2025Code
Robustness Assessment and Enhancement of Text Watermarking for Google's SynthIDXia Han, Qi Li, Jianbing Ni et al.
Recent advances in LLM watermarking methods such as SynthID-Text by Google DeepMind offer promising solutions for tracing the provenance of AI-generated text. However, our robustness assessment reveals that SynthID-Text is vulnerable to meaning-preserving attacks, such as paraphrasing, copy-paste modifications, and back-translation, which can significantly degrade watermark detectability. To address these limitations, we propose SynGuard, a hybrid framework that combines the semantic alignment strength of Semantic Information Retrieval (SIR) with the probabilistic watermarking mechanism of SynthID-Text. Our approach jointly embeds watermarks at both lexical and semantic levels, enabling robust provenance tracking while preserving the original meaning. Experimental results across multiple attack scenarios show that SynGuard improves watermark recovery by an average of 11.1\% in F1 score compared to SynthID-Text. These findings demonstrate the effectiveness of semantic-aware watermarking in resisting real-world tampering. All code, datasets, and evaluation scripts are publicly available at: https://github.com/githshine/SynGuard.
CRJul 26, 2023
Unveiling Security, Privacy, and Ethical Concerns of ChatGPTXiaodong Wu, Ran Duan, Jianbing Ni
This paper delves into the realm of ChatGPT, an AI-powered chatbot that utilizes topic modeling and reinforcement learning to generate natural responses. Although ChatGPT holds immense promise across various industries, such as customer service, education, mental health treatment, personal productivity, and content creation, it is essential to address its security, privacy, and ethical implications. By exploring the upgrade path from GPT-1 to GPT-4, discussing the model's features, limitations, and potential applications, this study aims to shed light on the potential risks of integrating ChatGPT into our daily lives. Focusing on security, privacy, and ethics issues, we highlight the challenges these concerns pose for widespread adoption. Finally, we analyze the open problems in these areas, calling for concerted efforts to ensure the development of secure and ethically sound large language models.
CRMay 22
Security, Privacy, and Ethical Risks in OpenClawYutong Jin, Zelin Zhang, Zhijin Lyu et al.
This paper systematically investigates the security, privacy, and ethical risks, as well as the traceability challenges of OpenClaw, a locally executable AI agent system for natural language interaction and real-world task completion. While OpenClaw shows strong potential for personal assistance, office automation, cross-platform task management, and information integration, it also raises serious security, privacy, and ethical concerns. By analyzing its system architecture, core functionalities, deployment model, and representative application scenarios, this paper aims to reveal the risks that may arise when such a highly privileged agent is integrated into personal and organizational digital environments. We focus in particular on the challenges associated with persistent local storage, tool invocation, cross-context information aggregation, multi-user interaction, and the integration of plugins and external services. We argue that these issues constitute major barriers to the trustworthy deployment and widespread adoption of this technology. Finally, we summarize the open challenges in security defenses, privacy protection, ethical governance, and traceability in agent use, and call for joint efforts from researchers, developers, deployers, and regulators to build AI agent systems that are safer, more reliable, and more trustworthy.
CVMay 19
Are Watermarked Images Editable? SafeMark for Watermark-Preserving Text-Guided Image EditingXiaodong Wu, Qi Li, Xiangman Li et al.
This paper investigates a fundamental yet underexplored question: can watermarked images remain editable without compromising watermark integrity? We propose SafeMark, a framework for watermark-preserving text-guided image manipulation that explicitly integrates watermark integrity into the editing process. Specifically, SafeMark adds a thresholded watermark-decoding loss directly to the diffusion editor's training objective, fine-tuning the editor so that semantically valid edits also preserve the embedded watermark at the final output. This design admits a clean information-theoretic justification: maintaining high bit-accuracy on the edited image lower-bounds the mutual information that the editor channel preserves between watermark and edited output, the quantity that fundamentally controls watermark recoverability. SafeMark is compatible with differentiable diffusion-based editors, and requires no architectural modification. Extensive evaluations across multiple datasets, text-guided editing methods, and post-edit distortion settings demonstrate that SafeMark achieves high watermark bit accuracy across diverse editing settings while maintaining high-quality semantic edits, without sacrificing robustness to common post-edit distortions. These results demonstrate that semantic editability and watermark integrity are fundamentally compatible, enabling trustworthy image provenance in generative editing pipelines.
CRMay 15
From AI-Generated Content to Agentic Action: Security and Safety Threats in Generative AIZelin Zhang, Qi Li, Jie Cao et al.
Generative AI systems are increasingly used not only to produce content but also to retrieve data, invoke tools, and execute actions. This work examines the security and safety implications of that shift across content-level, model-level, and agentic threats. We analyze how attacker access requirements, system autonomy, and the scope of potential harm change as models move from generating artifacts to executing operations through tool chains and external APIs. We then assess technical countermeasures including detection, watermarking, alignment, and emerging agentic safeguards, and show that several depend on forms of institutional coordination that current governance arrangements do not yet provide. Across the cases examined, capability deployment and attack-surface expansion repeatedly outpace defensive responses as systems move from generating content to executing real-world actions.
SDMay 2
MelShield: Robust Mel-Domain Audio Watermarking for Provenance Attribution of AI Generated Synthesized SpeechYutong Jin, Qi Li, Lingshuang Liu et al.
In this paper, we propose MelShield, a robust, in-generation, keyed audio watermarking framework that embeds identifiable signals into AI-generated audio for copyright protection and reliable attribution. Specifically, MelShield operates in the Mel-spectrogram domain during the generation process, targeting intermediate acoustic representations in Mel-conditioned pipelines for text-to-speech (TTS) generation. The core idea is to treat the intermediate Mel-spectrogram as the host signal and embed a short binary payload via low-energy, keyed spread-spectrum perturbations distributed across carefully selected time-frequency regions prior to waveform synthesis. By performing watermarking before vocoder inference, MelShield remains plug-and-play for Mel-conditioned TTS architectures and does not require modification or retraining of the underlying TTS generation vocoder, such as DiffWave and HiFi-GAN. Moreover, the multi-user keyed construction enables scalable user-specific attribution, while the keyed verification mechanism limits unauthorized decoding, thereby reducing the risk of large-scale extractor probing and adversarial analysis. Extensive experiments on DiffWave and HiFi-GAN demonstrate that MelShield achieves reliable watermark extraction, approaching 100\% bit accuracy, even under signal distortions, e.g., compression and additive noise, while preserving high perceptual audio quality.
CLSep 5, 2024
Bypassing DARCY Defense: Indistinguishable Universal Adversarial TriggersZuquan Peng, Yuanyuan He, Jianbing Ni et al.
Neural networks (NN) classification models for Natural Language Processing (NLP) are vulnerable to the Universal Adversarial Triggers (UAT) attack that triggers a model to produce a specific prediction for any input. DARCY borrows the "honeypot" concept to bait multiple trapdoors, effectively detecting the adversarial examples generated by UAT. Unfortunately, we find a new UAT generation method, called IndisUAT, which produces triggers (i.e., tokens) and uses them to craft adversarial examples whose feature distribution is indistinguishable from that of the benign examples in a randomly-chosen category at the detection layer of DARCY. The produced adversarial examples incur the maximal loss of predicting results in the DARCY-protected models. Meanwhile, the produced triggers are effective in black-box models for text generation, text inference, and reading comprehension. Finally, the evaluation results under NN models for NLP tasks indicate that the IndisUAT method can effectively circumvent DARCY and penetrate other defenses. For example, IndisUAT can reduce the true positive rate of DARCY's detection by at least 40.8% and 90.6%, and drop the accuracy by at least 33.3% and 51.6% in the RNN and CNN models, respectively. IndisUAT reduces the accuracy of the BERT's adversarial defense model by at least 34.0%, and makes the GPT-2 language model spew racist outputs even when conditioned on non-racial context.
CRJun 23, 2025Code
Security Assessment of DeepSeek and GPT Series Models against Jailbreak AttacksXiaodong Wu, Xiangman Li, Jianbing Ni
The widespread deployment of large language models (LLMs) has raised critical concerns over their vulnerability to jailbreak attacks, i.e., adversarial prompts that bypass alignment mechanisms and elicit harmful or policy-violating outputs. While proprietary models like GPT-4 have undergone extensive evaluation, the robustness of emerging open-source alternatives such as DeepSeek remains largely underexplored, despite their growing adoption in real-world applications. In this paper, we present the first systematic jailbreak evaluation of DeepSeek-series models, comparing them with GPT-3.5 and GPT-4 using the HarmBench benchmark. We evaluate seven representative attack strategies across 510 harmful behaviors categorized by both function and semantic domain. Our analysis reveals that DeepSeek's Mixture-of-Experts (MoE) architecture introduces routing sparsity that offers selective robustness against optimization-based attacks such as TAP-T, but leads to significantly higher vulnerability under prompt-based and manually engineered attacks. In contrast, GPT-4 Turbo demonstrates stronger and more consistent safety alignment across diverse behaviors, likely due to its dense Transformer design and reinforcement learning from human feedback. Fine-grained behavioral analysis and case studies further show that DeepSeek often routes adversarial prompts to under-aligned expert modules, resulting in inconsistent refusal behaviors. These findings highlight a fundamental trade-off between architectural efficiency and alignment generalization, emphasizing the need for targeted safety tuning and modular alignment strategies to ensure secure deployment of open-source LLMs.
CVJul 7, 2025Code
LAID: Lightweight AI-Generated Image Detection in Spatial and Spectral DomainsNicholas Chivaran, Jianbing Ni
The recent proliferation of photorealistic AI-generated images (AIGI) has raised urgent concerns about their potential misuse, particularly on social media platforms. Current state-of-the-art AIGI detection methods typically rely on large, deep neural architectures, creating significant computational barriers to real-time, large-scale deployment on platforms like social media. To challenge this reliance on computationally intensive models, we introduce LAID, the first framework -- to our knowledge -- that benchmarks and evaluates the detection performance and efficiency of off-the-shelf lightweight neural networks. In this framework, we comprehensively train and evaluate selected models on a representative subset of the GenImage dataset across spatial, spectral, and fusion image domains. Our results demonstrate that lightweight models can achieve competitive accuracy, even under adversarial conditions, while incurring substantially lower memory and computation costs compared to current state-of-the-art methods. This study offers valuable insight into the trade-off between efficiency and performance in AIGI detection and lays a foundation for the development of practical, scalable, and trustworthy detection systems. The source code of LAID can be found at: https://github.com/nchivar/LAID.
CRSep 30, 2025
Secure and Robust Watermarking for AI-generated Images: A Comprehensive SurveyJie Cao, Qi Li, Zelin Zhang et al.
The rapid advancement of generative artificial intelligence (Gen-AI) has facilitated the effortless creation of high-quality images, while simultaneously raising critical concerns regarding intellectual property protection, authenticity, and accountability. Watermarking has emerged as a promising solution to these challenges by distinguishing AI-generated images from natural content, ensuring provenance, and fostering trustworthy digital ecosystems. This paper presents a comprehensive survey of the current state of AI-generated image watermarking, addressing five key dimensions: (1) formalization of image watermarking systems; (2) an overview and comparison of diverse watermarking techniques; (3) evaluation methodologies with respect to visual quality, capacity, and detectability; (4) vulnerabilities to malicious attacks; and (5) prevailing challenges and future directions. The survey aims to equip researchers with a holistic understanding of AI-generated image watermarking technologies, thereby promoting their continued development.
LGAug 21, 2025
SafeLLM: Unlearning Harmful Outputs from Large Language Models against Jailbreak AttacksXiangman Li, Xiaodong Wu, Qi Li et al.
Jailbreak attacks pose a serious threat to the safety of Large Language Models (LLMs) by crafting adversarial prompts that bypass alignment mechanisms, causing the models to produce harmful, restricted, or biased content. In this paper, we propose SafeLLM, a novel unlearning-based defense framework that unlearn the harmful knowledge from LLMs while preserving linguistic fluency and general capabilities. SafeLLM employs a three-stage pipeline: (1) dynamic unsafe output detection using a hybrid approach that integrates external classifiers with model-internal evaluations; (2) token-level harmful content tracing through feedforward network (FFN) activations to localize harmful knowledge; and (3) constrained optimization to suppress unsafe behavior without degrading overall model quality. SafeLLM achieves targeted and irreversible forgetting by identifying and neutralizing FFN substructures responsible for harmful generation pathways. Extensive experiments on prominent LLMs (Vicuna, LLaMA, and GPT-J) across multiple jailbreak benchmarks show that SafeLLM substantially reduces attack success rates while maintaining high general-purpose performance. Compared to standard defense methods such as supervised fine-tuning and direct preference optimization, SafeLLM offers stronger safety guarantees, more precise control over harmful behavior, and greater robustness to unseen attacks. Moreover, SafeLLM maintains the general performance after the harmful knowledge unlearned. These results highlight unlearning as a promising direction for scalable and effective LLM safety.
CRJul 4, 2025
SecureT2I: No More Unauthorized Manipulation on AI Generated Images from PromptsXiaodong Wu, Xiangman Li, Qi Li et al.
Text-guided image manipulation with diffusion models enables flexible and precise editing based on prompts, but raises ethical and copyright concerns due to potential unauthorized modifications. To address this, we propose SecureT2I, a secure framework designed to prevent unauthorized editing in diffusion-based generative models. SecureT2I is compatible with both general-purpose and domain-specific models and can be integrated via lightweight fine-tuning without architectural changes. We categorize images into a permit set and a forbid set based on editing permissions. For the permit set, the model learns to perform high-quality manipulations as usual. For the forbid set, we introduce training objectives that encourage vague or semantically ambiguous outputs (e.g., blurred images), thereby suppressing meaningful edits. The core challenge is to block unauthorized editing while preserving editing quality for permitted inputs. To this end, we design separate loss functions that guide selective editing behavior. Extensive experiments across multiple datasets and models show that SecureT2I effectively degrades manipulation quality on forbidden images while maintaining performance on permitted ones. We also evaluate generalization to unseen inputs and find that SecureT2I consistently outperforms baselines. Additionally, we analyze different vagueness strategies and find that resize-based degradation offers the best trade-off for secure manipulation control.
LGJan 31, 2024
Manipulating Predictions over Discrete Inputs in Machine TeachingXiaodong Wu, Yufei Han, Hayssam Dahrouj et al.
Machine teaching often involves the creation of an optimal (typically minimal) dataset to help a model (referred to as the `student') achieve specific goals given by a teacher. While abundant in the continuous domain, the studies on the effectiveness of machine teaching in the discrete domain are relatively limited. This paper focuses on machine teaching in the discrete domain, specifically on manipulating student models' predictions based on the goals of teachers via changing the training data efficiently. We formulate this task as a combinatorial optimization problem and solve it by proposing an iterative searching algorithm. Our algorithm demonstrates significant numerical merit in the scenarios where a teacher attempts at correcting erroneous predictions to improve the student's models, or maliciously manipulating the model to misclassify some specific samples to the target class aligned with his personal profits. Experimental results show that our proposed algorithm can have superior performance in effectively and efficiently manipulating the predictions of the model, surpassing conventional baselines.
CRDec 6, 2020
Security and Privacy for Mobile Edge Caching: Challenges and SolutionsJianbing Ni, Kuan Zhang, Athanasios V. Vasilakos
Mobile edge caching is a promising technology for the next-generation mobile networks to effectively offer service environment and cloud-storage capabilities at the edge of networks. By exploiting the storage and computing resources at the network edge, mobile edge caching can significantly reduce service latency, decrease network load, and improve user experience. On the other hand, edge caching is subject to a number of threats regarding privacy violation and security breach. In this article, we first introduce the architecture of mobile edge caching, and address the key problems regarding why, where, what, and how to cache. Then, we examine the potential cyber threats, including cache poisoning attacks, cache pollution attacks, cache side-channel attacks, and cache deception attacks, which result in huge concerns about privacy, security, and trust in content placement, content delivery, and content usage for mobile users, respectively. After that, we propose a service-oriented and location-based efficient key distribution protocol (SOLEK) as an example in response to efficient and secure content delivery in mobile edge caching. Finally, we discuss the potential techniques for privacy-preserving content placement, efficient and secure content delivery, and trustful content usage, which are expected to draw more attention and efforts into secure edge caching.
CRJul 19, 2020
Private, Fair, and Verifiable Aggregate Statistics for Mobile Crowdsensing in Blockchain EraMiao He, Jianbing Ni, Dongxiao Liu et al.
In this paper, we propose FairCrowd, a private, fair, and verifiable framework for aggregate statistics in mobile crowdsensing based on the public blockchain. In specific, mobile users are incentivized to collect and share private data values (e.g., current locations) to fufill a commonly interested task released by a customer, and the crowdsensing server computes aggregate statistics over the values of mobile users (e.g., the most popular location) for the customer. By utilizing the ElGamal encryption, the server learns nearly nothing about the private data or the statistical result. The correctness of aggregate statistics can be publicly verified by using a new efficient and verifiable computation approach. Moreover, the fairness of incentive is guaranteed based on the public blockchain in the presence of greedy service provider, customers, and mobile users, who may launch payment-escaping, payment-reduction, free-riding, double-reporting, and Sybil attacks to corrupt reward distribution. Finally, FairCrowd is proved to achieve verifiable aggregate statistics with privacy preservation for mobile users. Extensive experiments are conducted to demonstrate the high efficiency of FairCrowd for aggregate statistics in mobile crowdsensing.
CRFeb 19, 2019
Towards Edge-assisted Internet of Things: From Security and Efficiency PerspectivesJianbing Ni, Xiaodong Lin, Xuemin et al.
As we are moving towards the Internet of Things (IoT) era, the number of connected physical devices is increasing at a rapid pace. Mobile edge computing is emerging to handle the sheer volume of produced data and reach the latency demand of computation-intensive IoT applications. Although the advance of mobile edge computing on service latency is studied solidly, security and efficiency on data usage in mobile edge computing have not been clearly identified. In this article, we examine the architecture of mobile edge computing and explore the potentials of utilizing mobile edge computing to enhance data analysis for IoT applications, while achieving data security and computational efficiency. Specifically, we first introduce the overall architecture and several promising edge-assisted IoT applications. We then study the security, privacy and efficiency challenges in data processing for mobile edge computing, and discuss the opportunities to enhance data security and improve computational efficiency with the assistance of edge computing, including secure data aggregation, secure data deduplication and secure computational offloading. Finally, several interesting directions on edge-empowered data analysis are presented for future research.
CRJun 11, 2018
Enabling Strong Privacy Preservation and Accurate Task Allocation for Mobile CrowdsensingJianbing Ni, Kuan Zhang, Qi Xia et al.
Mobile crowdsensing engages a crowd of individuals to use their mobile devices to cooperatively collect data about social events and phenomena for special interest customers. It can reduce the cost on sensor deployment and improve data quality with human intelligence. To enhance data trustworthiness, it is critical for service provider to recruit mobile users based on their personal features, e.g., mobility pattern and reputation, but it leads to the privacy leakage of mobile users. Therefore, how to resolve the contradiction between user privacy and task allocation is challenging in mobile crowdsensing. In this paper, we propose SPOON, a strong privacy-preserving mobile crowdsensing scheme supporting accurate task allocation from geographic information and credit points of mobile users. In SPOON, the service provider enables to recruit mobile users based on their locations, and select proper sensing reports according to their trust levels without invading user privacy. By utilizing proxy re-encryption and BBS+ signature, sensing tasks are protected and reports are anonymized to prevent privacy leakage. In addition, a privacy-preserving credit management mechanism is introduced to achieve decentralized trust management and secure credit proof for mobile users. Finally, we show the security properties of SPOON and demonstrate its efficiency on computation and communication.