CRMay 29Code
TRACE: Task-Aware Adaptive Self-Evolving Agentic JailbreakingChurui Zeng, Weiwei Qi, Kedong Xiu et al.
The rise of LLM agents introduces a new threat by enabling planning, coding, and even end-to-end execution of expert-level attack workflows. However, this threat remains underexplored and underestimated since (i) safety alignment prevents LLMs from directly generating harmful instructions, and (ii) most existing jailbreak methods cannot consistently induce agents to execute malicious operations. In this paper, we propose TRACE, a practical agentic jailbreaking framework to further reveal the risks of this threat surface. To conceal the malicious intent, TRACE decomposes a malicious task into multiple subtask sequences under different schemes and selects the sequence with the fewest explicitly harmful subtasks. TRACE then disguises the remaining harmful subtasks as benign-looking instructions by embedding them in task-aware scenarios with related roles, environments, directives, and heuristics. The scenarios are iteratively evolved through well-defined transformation actions, which are sampled by a Q-learning-inspired mechanism, for inducing the agent to execute on the harmful subtasks. Extensive evaluations on AgentHarm and AdvCUA show that TRACE consistently outperforms existing jailbreak baselines across multiple advanced LLM agents, achieving up to 100% bypass rate and 0.73 average success score. We also demonstrate the effectiveness of TRACE in controlled cyberattack instances. Our code and demos are available at https://github.com/ZJU-LLM-Safety/TRACE.git.
CVSep 20, 2023Code
Locate and Verify: A Two-Stream Network for Improved Deepfake DetectionChao Shuai, Jieming Zhong, Shuang Wu et al.
Deepfake has taken the world by storm, triggering a trust crisis. Current deepfake detection methods are typically inadequate in generalizability, with a tendency to overfit to image contents such as the background, which are frequently occurring but relatively unimportant in the training dataset. Furthermore, current methods heavily rely on a few dominant forgery regions and may ignore other equally important regions, leading to inadequate uncovering of forgery cues. In this paper, we strive to address these shortcomings from three aspects: (1) We propose an innovative two-stream network that effectively enlarges the potential regions from which the model extracts forgery evidence. (2) We devise three functional modules to handle the multi-stream and multi-scale features in a collaborative learning scheme. (3) Confronted with the challenge of obtaining forgery annotations, we propose a Semi-supervised Patch Similarity Learning strategy to estimate patch-level forged location annotations. Empirically, our method demonstrates significantly improved robustness and generalizability, outperforming previous methods on six benchmarks, and improving the frame-level AUC on Deepfake Detection Challenge preview dataset from 0.797 to 0.835 and video-level AUC on CelebDF$\_$v1 dataset from 0.811 to 0.847. Our implementation is available at https://github.com/sccsok/Locate-and-Verify.
CVSep 18, 2023Code
DFIL: Deepfake Incremental Learning by Exploiting Domain-invariant Forgery CluesKun Pan, Yin Yifang, Yao Wei et al.
The malicious use and widespread dissemination of deepfake pose a significant crisis of trust. Current deepfake detection models can generally recognize forgery images by training on a large dataset. However, the accuracy of detection models degrades significantly on images generated by new deepfake methods due to the difference in data distribution. To tackle this issue, we present a novel incremental learning framework that improves the generalization of deepfake detection models by continual learning from a small number of new samples. To cope with different data distributions, we propose to learn a domain-invariant representation based on supervised contrastive learning, preventing overfit to the insufficient new data. To mitigate catastrophic forgetting, we regularize our model in both feature-level and label-level based on a multi-perspective knowledge distillation approach. Finally, we propose to select both central and hard representative samples to update the replay set, which is beneficial for both domain-invariant representation learning and rehearsal-based knowledge preserving. We conduct extensive experiments on four benchmark datasets, obtaining the new state-of-the-art average forgetting rate of 7.01 and average accuracy of 85.49 on FF++, DFDC-P, DFD, and CDF2. Our code is released at https://github.com/DeepFakeIL/DFIL.
CRAug 4, 2022Code
MOVE: Effective and Harmless Ownership Verification via Embedded External FeaturesYiming Li, Linghui Zhu, Xiaojun Jia et al.
Currently, deep neural networks (DNNs) are widely adopted in different applications. Despite its commercial values, training a well-performing DNN is resource-consuming. Accordingly, the well-trained model is valuable intellectual property for its owner. However, recent studies revealed the threats of model stealing, where the adversaries can obtain a function-similar copy of the victim model, even when they can only query the model. In this paper, we propose an effective and harmless model ownership verification (MOVE) to defend against different types of model stealing simultaneously, without introducing new security risks. In general, we conduct the ownership verification by verifying whether a suspicious model contains the knowledge of defender-specified external features. Specifically, we embed the external features by modifying a few training samples with style transfer. We then train a meta-classifier to determine whether a model is stolen from the victim. This approach is inspired by the understanding that the stolen models should contain the knowledge of features learned by the victim model. In particular, \revision{we develop our MOVE method under both white-box and black-box settings and analyze its theoretical foundation to provide comprehensive model protection.} Extensive experiments on benchmark datasets verify the effectiveness of our method and its resistance to potential adaptive attacks. The codes for reproducing the main experiments of our method are available at https://github.com/THUYimingLi/MOVE.
LGOct 4, 2022Code
OpBoost: A Vertical Federated Tree Boosting Framework Based on Order-Preserving DesensitizationXiaochen Li, Yuke Hu, Weiran Liu et al.
Vertical Federated Learning (FL) is a new paradigm that enables users with non-overlapping attributes of the same data samples to jointly train a model without directly sharing the raw data. Nevertheless, recent works show that it's still not sufficient to prevent privacy leakage from the training process or the trained model. This paper focuses on studying the privacy-preserving tree boosting algorithms under the vertical FL. The existing solutions based on cryptography involve heavy computation and communication overhead and are vulnerable to inference attacks. Although the solution based on Local Differential Privacy (LDP) addresses the above problems, it leads to the low accuracy of the trained model. This paper explores to improve the accuracy of the widely deployed tree boosting algorithms satisfying differential privacy under vertical FL. Specifically, we introduce a framework called OpBoost. Three order-preserving desensitization algorithms satisfying a variant of LDP called distance-based LDP (dLDP) are designed to desensitize the training data. In particular, we optimize the dLDP definition and study efficient sampling distributions to further improve the accuracy and efficiency of the proposed algorithms. The proposed algorithms provide a trade-off between the privacy of pairs with large distance and the utility of desensitized values. Comprehensive evaluations show that OpBoost has a better performance on prediction accuracy of trained models compared with existing LDP approaches on reasonable settings. Our code is open source.
CVJun 13, 2023Code
Action Recognition with Multi-stream Motion Modeling and Mutual Information MaximizationYuheng Yang, Haipeng Chen, Zhenguang Liu et al.
Action recognition has long been a fundamental and intriguing problem in artificial intelligence. The task is challenging due to the high dimensionality nature of an action, as well as the subtle motion details to be considered. Current state-of-the-art approaches typically learn from articulated motion sequences in the straightforward 3D Euclidean space. However, the vanilla Euclidean space is not efficient for modeling important motion characteristics such as the joint-wise angular acceleration, which reveals the driving force behind the motion. Moreover, current methods typically attend to each channel equally and lack theoretical constrains on extracting task-relevant features from the input. In this paper, we seek to tackle these challenges from three aspects: (1) We propose to incorporate an acceleration representation, explicitly modeling the higher-order variations in motion. (2) We introduce a novel Stream-GCN network equipped with multi-stream components and channel attention, where different representations (i.e., streams) supplement each other towards a more precise action recognition while attention capitalizes on those important channels. (3) We explore feature-level supervision for maximizing the extraction of task-relevant information and formulate this into a mutual information loss. Empirically, our approach sets the new state-of-the-art performance on three benchmark datasets, NTU RGB+D, NTU RGB+D 120, and NW-UCLA. Our code is anonymously released at https://github.com/ActionR-Group/Stream-GCN, hoping to inspire the community.
AIJun 4
QCFuse: Query-Aware Cache Fusion via Compressed View for Efficient RAG ServingJianxin Yan, Wangze Ni, Zhenxin Li et al.
Retrieval-augmented generation (RAG) improves large language model (LLM) answer quality by grounding generation in external evidence, but processing retrieved contexts makes the prefill stage a dominant serving cost. RAG cache fusion reduces this cost by reusing precomputed key-value (KV) caches for retrieved chunks and selectively recomputing tokens under the current prompt. Existing selectors, however, face a dilemma between quality and efficiency: fast query-agnostic or final-layer query-to-context selectors can miss request-relevant evidence, whereas full-view query-aware selectors require broad context and layer visibility before recomputation and therefore stall the layer-wise cache-fusion pipeline. We present QCFuse, a compressed-view query-aware selector for RAG cache fusion. QCFuse uses chunk-anchor query probing to condition user-query states on compact per-chunk anchors and critical-layer profiling to identify recomputation tokens without all-layer inspection. We implement QCFuse in SGLang and evaluate it on four open-weight LLMs across six datasets. QCFuse reaches full-prefill-level quality. At matched quality, QCFuse achieves an average prefill-time speedup of 1.7x over full prefill and 1.5x over ProphetKV, the strongest quality-preserving baseline.
CVAug 23, 2023Code
RemovalNet: DNN Fingerprint Removal AttacksHongwei Yao, Zheng Li, Kunzhe Huang et al.
With the performance of deep neural networks (DNNs) remarkably improving, DNNs have been widely used in many areas. Consequently, the DNN model has become a valuable asset, and its intellectual property is safeguarded by ownership verification techniques (e.g., DNN fingerprinting). However, the feasibility of the DNN fingerprint removal attack and its potential influence remains an open problem. In this paper, we perform the first comprehensive investigation of DNN fingerprint removal attacks. Generally, the knowledge contained in a DNN model can be categorized into general semantic and fingerprint-specific knowledge. To this end, we propose a min-max bilevel optimization-based DNN fingerprint removal attack named RemovalNet, to evade model ownership verification. The lower-level optimization is designed to remove fingerprint-specific knowledge. While in the upper-level optimization, we distill the victim model's general semantic knowledge to maintain the surrogate model's performance. We conduct extensive experiments to evaluate the fidelity, effectiveness, and efficiency of the RemovalNet against four advanced defense methods on six metrics. The empirical results demonstrate that (1) the RemovalNet is effective. After our DNN fingerprint removal attack, the model distance between the target and surrogate models is x100 times higher than that of the baseline attacks, (2) the RemovalNet is efficient. It uses only 0.2% (400 samples) of the substitute dataset and 1,000 iterations to conduct our attack. Besides, compared with advanced model stealing attacks, the RemovalNet saves nearly 85% of computational resources at most, (3) the RemovalNet achieves high fidelity that the created surrogate model maintains high accuracy after the DNN fingerprint removal process. Our code is available at: https://github.com/grasses/RemovalNet.
CROct 20, 2023Code
FLTracer: Accurate Poisoning Attack Provenance in Federated LearningXinyu Zhang, Qingyu Liu, Zhongjie Ba et al.
Federated Learning (FL) is a promising distributed learning approach that enables multiple clients to collaboratively train a shared global model. However, recent studies show that FL is vulnerable to various poisoning attacks, which can degrade the performance of global models or introduce backdoors into them. In this paper, we first conduct a comprehensive study on prior FL attacks and detection methods. The results show that all existing detection methods are only effective against limited and specific attacks. Most detection methods suffer from high false positives, which lead to significant performance degradation, especially in not independent and identically distributed (non-IID) settings. To address these issues, we propose FLTracer, the first FL attack provenance framework to accurately detect various attacks and trace the attack time, objective, type, and poisoned location of updates. Different from existing methodologies that rely solely on cross-client anomaly detection, we propose a Kalman filter-based cross-round detection to identify adversaries by seeking the behavior changes before and after the attack. Thus, this makes it resilient to data heterogeneity and is effective even in non-IID settings. To further improve the accuracy of our detection method, we employ four novel features and capture their anomalies with the joint decisions. Extensive evaluations show that FLTracer achieves an average true positive rate of over $96.88\%$ at an average false positive rate of less than $2.67\%$, significantly outperforming SOTA detection methods. \footnote{Code is available at \url{https://github.com/Eyr3/FLTracer}.}
CVJun 20, 2023Code
Masked Diffusion Models Are Fast Distribution LearnersJiachen Lei, Qinglong Wang, Peng Cheng et al.
Diffusion model has emerged as the \emph{de-facto} model for image generation, yet the heavy training overhead hinders its broader adoption in the research community. We observe that diffusion models are commonly trained to learn all fine-grained visual information from scratch. This paradigm may cause unnecessary training costs hence requiring in-depth investigation. In this work, we show that it suffices to train a strong diffusion model by first pre-training the model to learn some primer distribution that loosely characterizes the unknown real image distribution. Then the pre-trained model can be fine-tuned for various generation tasks efficiently. In the pre-training stage, we propose to mask a high proportion (e.g., up to 90\%) of input images to approximately represent the primer distribution and introduce a masked denoising score matching objective to train a model to denoise visible areas. In subsequent fine-tuning stage, we efficiently train diffusion model without masking. Utilizing the two-stage training framework, we achieves significant training acceleration and a new FID score record of 6.27 on CelebA-HQ $256 \times 256$ for ViT-based diffusion models. The generalizability of a pre-trained model further helps building models that perform better than ones trained from scratch on different downstream datasets. For instance, a diffusion model pre-trained on VGGFace2 attains a 46\% quality improvement when fine-tuned on a different dataset that contains only 3000 images. Our code is available at \url{https://github.com/jiachenlei/maskdm}.
CRMay 24Code
RouteScan: A Non-Intrusive Approach to Auditing MoE LLMs Safety via Expert Routing TelemetryBo Lv, Zhiheng Xu, KeDong Xiu et al.
Mixture-of-Experts (MoE) architectures have become an increasingly important paradigm for scaling Large Language Models (LLMs). As MoE models are increasingly deployed in real-world services, safety auditing becomes necessary to verify whether these models produce or facilitate harmful behaviors during operation. However, existing content-based auditing methods typically require access to user prompts, model inputs, or generated outputs, potentially exposing sensitive user information and creating a fundamental tension between LLM safety and user privacy. On the other hand, we observe that, in MoE models, sparse expert routing maps different inputs to activate different expert-execution patterns, producing measurable footprints in low-level GPU execution telemetry. Inspired by this observation, we propose RouteScan, a non-intrusive auditing framework for detecting harmful behaviors through GPU-level expert routing telemetry. Specifically, RouteScan utilizes the number of active GPU threads allocated to expert modules during the prefilling phase as a discriminative micro-architectural fingerprint, and builds a lightweight detection pipeline that isolates cross-domain invariant risk indicators for the precise identification of malicious prompts. Comprehensive evaluations on open-source MoE LLMs with distinct routing designs demonstrate that RouteScan achieves strong generalization, with an AUROC exceeding 0.93 on unseen harmful domains and 0.96 under novel jailbreak wrappers. Moreover, empirical inversion tests show that the collected expert routing telemetry provides limited information for prompt reconstruction, suggesting a practical privacy advantage over content-based auditing methods.
CVMar 22Code
JANUS: A Lightweight Framework for Jailbreaking Text-to-Image Models via Distribution OptimizationHaolun Zheng, Yu He, Tailun Chen et al.
Text-to-image (T2I) models such as Stable Diffusion and DALLE remain susceptible to generating harmful or Not-Safe-For-Work (NSFW) content under jailbreak attacks despite deployed safety filters. Existing jailbreak attacks either rely on proxy-loss optimization instead of the true end-to-end objective, or depend on large-scale and costly RL-trained generators. Motivated by these limitations, we propose JANUS , a lightweight framework that formulates jailbreak as optimizing a structured prompt distribution under a black-box, end-to-end reward from the T2I system and its safety filters. JANUS replaces a high-capacity generator with a low-dimensional mixing policy over two semantically anchored prompt distributions, enabling efficient exploration while preserving the target semantics. On modern T2I models, we outperform state-of-the-art jailbreak methods, improving ASR-8 from 25.30% to 43.15% on Stable Diffusion 3.5 Large Turbo with consistently higher CLIP and NSFW scores. JANUS succeeds across both open-source and commercial models. These findings expose structural weaknesses in current T2I safety pipelines and motivate stronger, distribution-aware defenses. Warning: This paper contains model outputs that may be offensive.
CRNov 14, 2022
FedTracker: Furnishing Ownership Verification and Traceability for Federated Learning ModelShuo Shao, Wenyuan Yang, Hanlin Gu et al.
Federated learning (FL) is a distributed machine learning paradigm allowing multiple clients to collaboratively train a global model without sharing their local data. However, FL entails exposing the model to various participants. This poses a risk of unauthorized model distribution or resale by the malicious client, compromising the intellectual property rights of the FL group. To deter such misbehavior, it is essential to establish a mechanism for verifying the ownership of the model and as well tracing its origin to the leaker among the FL participants. In this paper, we present FedTracker, the first FL model protection framework that provides both ownership verification and traceability. FedTracker adopts a bi-level protection scheme consisting of global watermark mechanism and local fingerprint mechanism. The former authenticates the ownership of the global model, while the latter identifies which client the model is derived from. FedTracker leverages Continual Learning (CL) principles to embed the watermark in a way that preserves the utility of the FL model on both primitive task and watermark task. FedTracker also devises a novel metric to better discriminate different fingerprints. Experimental results show FedTracker is effective in ownership verification, traceability, and maintains good fidelity and robustness against various watermark removal attacks.
CVSep 25, 2023
SurrogatePrompt: Bypassing the Safety Filter of Text-to-Image Models via SubstitutionZhongjie Ba, Jieming Zhong, Jiachen Lei et al.
Advanced text-to-image models such as DALL$\cdot$E 2 and Midjourney possess the capacity to generate highly realistic images, raising significant concerns regarding the potential proliferation of unsafe content. This includes adult, violent, or deceptive imagery of political figures. Despite claims of rigorous safety mechanisms implemented in these models to restrict the generation of not-safe-for-work (NSFW) content, we successfully devise and exhibit the first prompt attacks on Midjourney, resulting in the production of abundant photorealistic NSFW images. We reveal the fundamental principles of such prompt attacks and suggest strategically substituting high-risk sections within a suspect prompt to evade closed-source safety measures. Our novel framework, SurrogatePrompt, systematically generates attack prompts, utilizing large language models, image-to-text, and image-to-image modules to automate attack prompt creation at scale. Evaluation results disclose an 88% success rate in bypassing Midjourney's proprietary safety filter with our attack prompts, leading to the generation of counterfeit images depicting political figures in violent scenarios. Both subjective and objective assessments validate that the images generated from our attack prompts present considerable safety hazards.
LGDec 1, 2022
Purifier: Defending Data Inference Attacks via Transforming Confidence ScoresZiqi Yang, Lijin Wang, Da Yang et al.
Neural networks are susceptible to data inference attacks such as the membership inference attack, the adversarial model inversion attack and the attribute inference attack, where the attacker could infer useful information such as the membership, the reconstruction or the sensitive attributes of a data sample from the confidence scores predicted by the target classifier. In this paper, we propose a method, namely PURIFIER, to defend against membership inference attacks. It transforms the confidence score vectors predicted by the target classifier and makes purified confidence scores indistinguishable in individual shape, statistical distribution and prediction label between members and non-members. The experimental results show that PURIFIER helps defend membership inference attacks with high effectiveness and efficiency, outperforming previous defense methods, and also incurs negligible utility loss. Besides, our further experiments show that PURIFIER is also effective in defending adversarial model inversion attacks and attribute inference attacks. For example, the inversion error is raised about 4+ times on the Facescrub530 classifier, and the attribute inference accuracy drops significantly when PURIFIER is deployed in our experiment.
CLFeb 19Code
Towards Cross-lingual Values Assessment: A Consensus-Pluralism PerspectiveYukun Chen, Xinyu Zhang, Jialong Tang et al.
While large language models (LLMs) have become pivotal to content safety, current evaluation paradigms primarily focus on detecting explicit harms (e.g., violence or hate speech), neglecting the subtler value dimensions conveyed in digital content. To bridge this gap, we introduce X-Value, a novel Cross-lingual Values Assessment Benchmark designed to evaluate LLMs' ability to assess deep-level values of content from a global perspective. X-Value consists of more than 5,000 QA pairs across 18 languages, systematically organized into 7 core domains grounded in Schwartz's Theory of Basic Human Values and categorized into easy and hard levels for discriminative evaluation. We further propose a unique two-stage annotation framework that first identifies whether an issue falls under global consensus (e.g., human rights) or pluralism (e.g., religion), and subsequently conducts a multi-party evaluation of the latent values embedded within the content. Systematic evaluations on X-Value reveal that current SOTA LLMs exhibit deficiencies in cross-lingual values assessment ($Acc < 77\%$), with significant performance disparities across different languages ($ΔAcc > 20\%$). This work highlights the urgent need to improve the nuanced, values-aware content assessment capability of LLMs. Our X-Value is available at: https://huggingface.co/datasets/Whitolf/X-Value.
LGMar 3, 2022
Fairness-aware Adversarial Perturbation Towards Bias Mitigation for Deployed Deep ModelsZhibo Wang, Xiaowei Dong, Henry Xue et al.
Prioritizing fairness is of central importance in artificial intelligence (AI) systems, especially for those societal applications, e.g., hiring systems should recommend applicants equally from different demographic groups, and risk assessment systems must eliminate racism in criminal justice. Existing efforts towards the ethical development of AI systems have leveraged data science to mitigate biases in the training set or introduced fairness principles into the training process. For a deployed AI system, however, it may not allow for retraining or tuning in practice. By contrast, we propose a more flexible approach, i.e., fairness-aware adversarial perturbation (FAAP), which learns to perturb input data to blind deployed models on fairness-related features, e.g., gender and ethnicity. The key advantage is that FAAP does not modify deployed models in terms of parameters and structures. To achieve this, we design a discriminator to distinguish fairness-related attributes based on latent representations from deployed models. Meanwhile, a perturbation generator is trained against the discriminator, such that no fairness-related features could be extracted from perturbed inputs. Exhaustive experimental evaluation demonstrates the effectiveness and superior performance of the proposed FAAP. In addition, FAAP is validated on real-world commercial deployments (inaccessible to model parameters), which shows the transferability of FAAP, foreseeing the potential of black-box adaptation.
CLMay 29
ConsisGuard: Aligning Safety Deliberation with Policy Enforcement in LLM GuardrailsYan Wang, Zhixuan Chu, Zihao Xue et al.
Reasoning-based LLM guardrails improve safety moderation by generating explicit rationales before issuing final decisions. However, their rationales do not always lead to faithful enforcement: a model may recognize a harmful intent in its reasoning but still predict a safe label, or issue an unsafe decision without policy-grounded justification. We identify this safety-critical failure mode as the deliberation-to-enforcement gap. Unlike general chain-of-thought faithfulness, guardrail reliability requires policy execution consistency: the generated reasoning should be grounded in the safety policy, and the final decision should be entailed by that reasoning. We propose ConsisGuard, a consistency-aware framework for reasoning-based LLM guardrails. ConsisGuard performs Policy-to-Decision Trajectory Distillation and Functional Coupling Alignment, aligning the internal coupling between safety deliberation and decision enforcement. Experiments on prompt and response harmfulness detection benchmarks show that ConsisGuard improves detection performance while reducing policy execution failures. These results suggest that reliable reasoning-based guardrails require accurate faithful execution of safety policies.
CRJul 31, 2023
Text-CRS: A Generalized Certified Robustness Framework against Textual Adversarial AttacksXinyu Zhang, Hanbin Hong, Yuan Hong et al.
The language models, especially the basic text classification models, have been shown to be susceptible to textual adversarial attacks such as synonym substitution and word insertion attacks. To defend against such attacks, a growing body of research has been devoted to improving the model robustness. However, providing provable robustness guarantees instead of empirical robustness is still widely unexplored. In this paper, we propose Text-CRS, a generalized certified robustness framework for natural language processing (NLP) based on randomized smoothing. To our best knowledge, existing certified schemes for NLP can only certify the robustness against $\ell_0$ perturbations in synonym substitution attacks. Representing each word-level adversarial operation (i.e., synonym substitution, word reordering, insertion, and deletion) as a combination of permutation and embedding transformation, we propose novel smoothing theorems to derive robustness bounds in both permutation and embedding space against such adversarial operations. To further improve certified accuracy and radius, we consider the numerical relationships between discrete words and select proper noise distributions for the randomized smoothing. Finally, we conduct substantial experiments on multiple language models and datasets. Text-CRS can address all four different word-level adversarial operations and achieve a significant accuracy improvement. We also provide the first benchmark on certified accuracy and radius of four word-level operations, besides outperforming the state-of-the-art certification against synonym substitution attacks.
CVApr 19Code
Unveiling Deepfakes: A Frequency-Aware Triple Branch Network for Deepfake DetectionQihao Shen, Jiaxing Xuan, Zhenguang Liu et al.
Advanced deepfake technologies are blurring the lines between real and fake, presenting both revolutionary opportunities and alarming threats. While it unlocks novel applications in fields like entertainment and education, its malicious use has sparked urgent ethical and societal concerns ranging from identity theft to the dissemination of misinformation. To tackle these challenges, feature analysis using frequency features has emergedas a promising direction for deepfake detection. However, oneaspect that has been overlooked so far is that existing methodstend to concentrate on one or a few specific frequency domains,which risks overfitting to particular artifacts and significantlyundermines their robustness when facing diverse forgery patterns. Another underexplored aspect we observe is that different features often attend to the same forged region, resulting in redundant feature representations and limiting the diversity of the extracted clues. This may undermine the ability of a model to capture complementary information across different facets, thereby compromising its generalization capability to diverse manipulations. In this paper, we seek to tackle these challenges from two aspects: (1) we propose a triple-branch network that jointly captures spatial and frequency features by learning from both original image and image reconstructed by different frequency channels, and (2) we mathematically derive feature decoupling and fusion losses grounded in the mutual information theory, which enhances the model to focus on task-relevant features across the original image and the image reconstructed by different frequency channels. Extensive experiments on six large-scale benchmark datasets demonstrate that our method consistently achieves state-of-the-art performance. Our code is released at https://github.com/injooker/Unveiling Deepfake.
CRMay 28
LoRA-Key: User-Centric LoRA Watermarking for Text-to-Image Diffusion ModelsYaopeng Wang, Qingliang Wang, Zhibo Wang et al.
Low-Rank Adaptation (LoRA) has become a widely used mechanism for customizing text-to-image diffusion models, enabling lightweight modules that are shared, reused, and commercialized as independent assets. This LoRA-centric ecosystem shifts copyright protection from foundation models to distributed LoRA modules, which are easy to copy, redistribute, or reuse without authorization. Existing watermarking methods either protect the base diffusion model or require watermark-aware retraining for each target LoRA, limiting their practicality in open community settings. To address this limitation, we propose LoRA-Key, a user-centric LoRA watermarking framework that treats copyright protection as a reusable ownership key. LoRA-Key encapsulates a recoverable secret message into a standalone user-specific Watermark LoRA, which can be attached to different target LoRAs through training-free linear superposition without per-LoRA retraining or structural modification. To train such a reusable key, we first establish a latent watermark prior in the frozen VAE latent space for robust message embedding and recovery, and then optimize the Watermark LoRA with message-conditioned watermark supervision and semantic consistency constraints. We further introduce Gradient Orthogonal Projection (GOP) to suppress watermark updates that conflict with semantic-preserving directions, reducing interference with generation fidelity and downstream style adaptation. Extensive experiments show that LoRA-Key provides lightweight plug-and-play copyright protection while preserving generation quality and style fidelity, and maintains robust ownership verification under image-level distortions, downstream fine-tuning, and multi-LoRA composition.
APNov 2, 2015
On the modeling and simulation of reaction-transfer dynamics in semiconductor-electrolyte solar cellsYuan He, Irene M. Gamba, Heung-Chan Lee et al.
The mathematical modeling and numerical simulation of semiconductor-electrolyte systems play important roles in the design of high-performance semiconductor-liquid junction solar cells. In this work, we propose a macroscopic mathematical model, a system of nonlinear partial differential equations, for the complete description of charge transfer dynamics in such systems. The model consists of a reaction-drift-diffusion-Poisson system that models the transport of electrons and holes in the semiconductor region and an equivalent system that describes the transport of reductants and oxidants, as well as other charged species, in the electrolyte region. The coupling between the semiconductor and the electrolyte is modeled through a set of interfacial reaction and current balance conditions. We present some numerical simulations to illustrate the quantitative behavior of the semiconductor-electrolyte system in both dark and illuminated environments. We show numerically that one can replace the electrolyte region in the system with a Schottky contact only when the bulk reductant-oxidant pair density is extremely high. Otherwise, such replacement gives significantly inaccurate description of the real dynamics of the semiconductor-electrolyte system.
CVNov 18, 2022
Masked Autoencoders for Egocentric Video Understanding @ Ego4D Challenge 2022Jiachen Lei, Shuang Ma, Zhongjie Ba et al.
In this report, we present our approach and empirical results of applying masked autoencoders in two egocentric video understanding tasks, namely, Object State Change Classification and PNR Temporal Localization, of Ego4D Challenge 2022. As team TheSSVL, we ranked 2nd place in both tasks. Our code will be made available.
OCFeb 8, 2023
Adaptive State-Dependent Diffusion for Derivative-Free OptimizationBjörn Engquist, Kui Ren, Yunan Yang
This paper develops and analyzes a stochastic derivative-free optimization strategy. A key feature is the state-dependent adaptive variance. We prove global convergence in probability with algebraic rate and give the quantitative results in numerical examples. A striking fact is that convergence is achieved without explicit information of the gradient and even without comparing different objective function values as in established methods such as the simplex method and simulated annealing. It can otherwise be compared to annealing with state-dependent temperature.
CRNov 3, 2023
ERASER: Machine Unlearning in MLaaS via an Inference Serving-Aware ApproachYuke Hu, Jian Lou, Jiaqi Liu et al.
Over the past years, Machine Learning-as-a-Service (MLaaS) has received a surging demand for supporting Machine Learning-driven services to offer revolutionized user experience across diverse application areas. MLaaS provides inference service with low inference latency based on an ML model trained using a dataset collected from numerous individual data owners. Recently, for the sake of data owners' privacy and to comply with the "right to be forgotten (RTBF)" as enacted by data protection legislation, many machine unlearning methods have been proposed to remove data owners' data from trained models upon their unlearning requests. However, despite their promising efficiency, almost all existing machine unlearning methods handle unlearning requests independently from inference requests, which unfortunately introduces a new security issue of inference service obsolescence and a privacy vulnerability of undesirable exposure for machine unlearning in MLaaS. In this paper, we propose the ERASER framework for machinE unleaRning in MLaAS via an inferencE seRving-aware approach. ERASER strategically choose appropriate unlearning execution timing to address the inference service obsolescence issue. A novel inference consistency certification mechanism is proposed to avoid the violation of RTBF principle caused by postponed unlearning executions, thereby mitigating the undesirable exposure vulnerability. ERASER offers three groups of design choices to allow for tailor-made variants that best suit the specific environments and preferences of various MLaaS systems. Extensive empirical evaluations across various settings confirm ERASER's effectiveness, e.g., it can effectively save up to 99% of inference latency and 31% of computation overhead over the inference-oblivion baseline.
CRJun 20, 2023
FDINet: Protecting against DNN Model Extraction via Feature Distortion IndexHongwei Yao, Zheng Li, Haiqin Weng et al.
Machine Learning as a Service (MLaaS) platforms have gained popularity due to their accessibility, cost-efficiency, scalability, and rapid development capabilities. However, recent research has highlighted the vulnerability of cloud-based models in MLaaS to model extraction attacks. In this paper, we introduce FDINET, a novel defense mechanism that leverages the feature distribution of deep neural network (DNN) models. Concretely, by analyzing the feature distribution from the adversary's queries, we reveal that the feature distribution of these queries deviates from that of the model's training set. Based on this key observation, we propose Feature Distortion Index (FDI), a metric designed to quantitatively measure the feature distribution deviation of received queries. The proposed FDINET utilizes FDI to train a binary detector and exploits FDI similarity to identify colluding adversaries from distributed extraction attacks. We conduct extensive experiments to evaluate FDINET against six state-of-the-art extraction attacks on four benchmark datasets and four popular model architectures. Empirical results demonstrate the following findings FDINET proves to be highly effective in detecting model extraction, achieving a 100% detection accuracy on DFME and DaST. FDINET is highly efficient, using just 50 queries to raise an extraction alarm with an average confidence of 96.08% for GTSRB. FDINET exhibits the capability to identify colluding adversaries with an accuracy exceeding 91%. Additionally, it demonstrates the ability to detect two types of adaptive attacks.
NAJan 12, 2018
An implicit boundary integral method for computing electric potential of macromolecules in solventYimin Zhong, Kui Ren, Richard Tsai
A numerical method using implicit surface representations is proposed to solve the linearized Poisson-Boltzmann equations that arise in mathematical models for the electrostatics of molecules in solvent. The proposed method used an implicit boundary integral formulation to derived a linear system defined on Cartesian nodes in a narrowband surrounding the closed surface that separate the molecule and the solvent. The needed implicit surfaces is constructed from the given atomic description of the molecules, by a sequence of standard level set algorithms. A fast multipole method is applied to accelerate the solution of the linear system. A few numerical studies involving some standard test cases are presented and compared to other existing results.
CRJul 23, 2024
RedAgent: Red Teaming Large Language Models with Context-aware Autonomous Language AgentHuiyu Xu, Wenhui Zhang, Zhibo Wang et al.
Recently, advanced Large Language Models (LLMs) such as GPT-4 have been integrated into many real-world applications like Code Copilot. These applications have significantly expanded the attack surface of LLMs, exposing them to a variety of threats. Among them, jailbreak attacks that induce toxic responses through jailbreak prompts have raised critical safety concerns. To identify these threats, a growing number of red teaming approaches simulate potential adversarial scenarios by crafting jailbreak prompts to test the target LLM. However, existing red teaming methods do not consider the unique vulnerabilities of LLM in different scenarios, making it difficult to adjust the jailbreak prompts to find context-specific vulnerabilities. Meanwhile, these methods are limited to refining jailbreak templates using a few mutation operations, lacking the automation and scalability to adapt to different scenarios. To enable context-aware and efficient red teaming, we abstract and model existing attacks into a coherent concept called "jailbreak strategy" and propose a multi-agent LLM system named RedAgent that leverages these strategies to generate context-aware jailbreak prompts. By self-reflecting on contextual feedback in an additional memory buffer, RedAgent continuously learns how to leverage these strategies to achieve effective jailbreaks in specific contexts. Extensive experiments demonstrate that our system can jailbreak most black-box LLMs in just five queries, improving the efficiency of existing red teaming methods by two times. Additionally, RedAgent can jailbreak customized LLM applications more efficiently. By generating context-aware jailbreak prompts towards applications on GPTs, we discover 60 severe vulnerabilities of these real-world applications with only two queries per vulnerability. We have reported all found issues and communicated with OpenAI and Meta for bug fixes.
APApr 28, 2017
Nonlinear quantitative photoacoustic tomography with two-photon absorptionKui Ren, Rongting Zhang
Two-photon photoacoustic tomography (TP-PAT) is a non-invasive optical molecular imaging modality that aims at inferring two-photon absorption property of heterogeneous media from photoacoustic measurements. In this work, we analyze an inverse problem in quantitative TP-PAT where we intend to reconstruct optical coefficients in a semilinear elliptic PDE, the mathematical model for the propagation of near infra-red photons in tissue-like optical media with two-photon absorption, from the internal absorbed energy data. We derive uniqueness and stability results on the reconstructions of single and multiple optical coefficients, and present some numerical reconstruction results based on synthetic data to complement the theoretical analysis.
OCApr 12, 2022
An Algebraically Converging Stochastic Gradient Descent Algorithm for Global OptimizationBjörn Engquist, Kui Ren, Yunan Yang
We propose a new gradient descent algorithm with added stochastic terms for finding the global optimizers of nonconvex optimization problems. A key component in the algorithm is the adaptive tuning of the randomness based on the value of the objective function. In the language of simulated annealing, the temperature is state-dependent. With this, we prove the global convergence of the algorithm with an algebraic rate both in probability and in the parameter space. This is a significant improvement over the classical rate from using a more straightforward control of the noise term. The convergence proof is based on the actual discrete setup of the algorithm, not just its continuous limit as often done in the literature. We also present several numerical examples to demonstrate the efficiency and robustness of the algorithm for reasonably complex objective functions.
CRApr 13, 2023
False Claims against Model Ownership ResolutionJian Liu, Rui Zhang, Sebastian Szyller et al.
Deep neural network (DNN) models are valuable intellectual property of model owners, constituting a competitive advantage. Therefore, it is crucial to develop techniques to protect against model theft. Model ownership resolution (MOR) is a class of techniques that can deter model theft. A MOR scheme enables an accuser to assert an ownership claim for a suspect model by presenting evidence, such as a watermark or fingerprint, to show that the suspect model was stolen or derived from a source model owned by the accuser. Most of the existing MOR schemes prioritize robustness against malicious suspects, ensuring that the accuser will win if the suspect model is indeed a stolen model. In this paper, we show that common MOR schemes in the literature are vulnerable to a different, equally important but insufficiently explored, robustness concern: a malicious accuser. We show how malicious accusers can successfully make false claims against independent suspect models that were not stolen. Our core idea is that a malicious accuser can deviate (without detection) from the specified MOR process by finding (transferable) adversarial examples that successfully serve as evidence against independent suspect models. To this end, we first generalize the procedures of common MOR schemes and show that, under this generalization, defending against false claims is as challenging as preventing (transferable) adversarial examples. Via systematic empirical evaluation, we show that our false claim attacks always succeed in the MOR schemes that follow our generalization, including in a real-world model: Amazon's Rekognition API.
CRAug 10, 2023
FINER: Enhancing State-of-the-art Classifiers with Feature Attribution to Facilitate Security AnalysisYiling He, Jian Lou, Zhan Qin et al.
Deep learning classifiers achieve state-of-the-art performance in various risk detection applications. They explore rich semantic representations and are supposed to automatically discover risk behaviors. However, due to the lack of transparency, the behavioral semantics cannot be conveyed to downstream security experts to reduce their heavy workload in security analysis. Although feature attribution (FA) methods can be used to explain deep learning, the underlying classifier is still blind to what behavior is suspicious, and the generated explanation cannot adapt to downstream tasks, incurring poor explanation fidelity and intelligibility. In this paper, we propose FINER, the first framework for risk detection classifiers to generate high-fidelity and high-intelligibility explanations. The high-level idea is to gather explanation efforts from model developer, FA designer, and security experts. To improve fidelity, we fine-tune the classifier with an explanation-guided multi-task learning strategy. To improve intelligibility, we engage task knowledge to adjust and ensemble FA methods. Extensive evaluations show that FINER improves explanation quality for risk detection. Moreover, we demonstrate that FINER outperforms a state-of-the-art tool in facilitating malware analysis.
CVJun 5, 2022
Vanilla Feature Distillation for Improving the Accuracy-Robustness Trade-Off in Adversarial TrainingGuodong Cao, Zhibo Wang, Xiaowei Dong et al.
Adversarial training has been widely explored for mitigating attacks against deep models. However, most existing works are still trapped in the dilemma between higher accuracy and stronger robustness since they tend to fit a model towards robust features (not easily tampered with by adversaries) while ignoring those non-robust but highly predictive features. To achieve a better robustness-accuracy trade-off, we propose the Vanilla Feature Distillation Adversarial Training (VFD-Adv), which conducts knowledge distillation from a pre-trained model (optimized towards high accuracy) to guide adversarial training towards higher accuracy, i.e., preserving those non-robust but predictive features. More specifically, both adversarial examples and their clean counterparts are forced to be aligned in the feature space by distilling predictive representations from the pre-trained/clean model, while previous works barely utilize predictive features from clean models. Therefore, the adversarial training model is updated towards maximally preserving the accuracy as gaining robustness. A key advantage of our method is that it can be universally adapted to and boost existing works. Exhaustive experiments on various datasets, classification models, and adversarial training algorithms demonstrate the effectiveness of our proposed method.
CRAug 3, 2024
ALIF: Low-Cost Adversarial Audio Attacks on Black-Box Speech Platforms using Linguistic FeaturesPeng Cheng, Yuwei Wang, Peng Huang et al.
Extensive research has revealed that adversarial examples (AE) pose a significant threat to voice-controllable smart devices. Recent studies have proposed black-box adversarial attacks that require only the final transcription from an automatic speech recognition (ASR) system. However, these attacks typically involve many queries to the ASR, resulting in substantial costs. Moreover, AE-based adversarial audio samples are susceptible to ASR updates. In this paper, we identify the root cause of these limitations, namely the inability to construct AE attack samples directly around the decision boundary of deep learning (DL) models. Building on this observation, we propose ALIF, the first black-box adversarial linguistic feature-based attack pipeline. We leverage the reciprocal process of text-to-speech (TTS) and ASR models to generate perturbations in the linguistic embedding space where the decision boundary resides. Based on the ALIF pipeline, we present the ALIF-OTL and ALIF-OTA schemes for launching attacks in both the digital domain and the physical playback environment on four commercial ASRs and voice assistants. Extensive evaluations demonstrate that ALIF-OTL and -OTA significantly improve query efficiency by 97.7% and 73.3%, respectively, while achieving competitive performance compared to existing methods. Notably, ALIF-OTL can generate an attack sample with only one query. Furthermore, our test-of-time experiment validates the robustness of our approach against ASR updates.
NADec 7, 2018
Characterizing impacts of model uncertainties in quantitative photoacousticsKui Ren, Sarah Vallélian
This work is concerned with uncertainty quantification problems for image reconstructions in quantitative photoacoustic imaging (PAT), a recent hybrid imaging modality that utilizes the photoacoustic effect to achieve high-resolution imaging of optical properties of tissue-like heterogeneous media. We quantify mathematically and computationally the impact of uncertainties in various model parameters of PAT on the accuracy of reconstructed optical properties. We derive, via sensitivity analysis, analytical bounds on error in image reconstructions in some simplified settings, and develop a computational procedure, based on the method of polynomial chaos expansion, for such error characterization in more general settings. Numerical simulations based on synthetic data are presented to illustrate the main ideas.
CRJul 6, 2024
Releasing Malevolence from Benevolence: The Menace of Benign Data on Machine UnlearningBinhao Ma, Tianhang Zheng, Hongsheng Hu et al.
Machine learning models trained on vast amounts of real or synthetic data often achieve outstanding predictive performance across various domains. However, this utility comes with increasing concerns about privacy, as the training data may include sensitive information. To address these concerns, machine unlearning has been proposed to erase specific data samples from models. While some unlearning techniques efficiently remove data at low costs, recent research highlights vulnerabilities where malicious users could request unlearning on manipulated data to compromise the model. Despite these attacks' effectiveness, perturbed data differs from original training data, failing hash verification. Existing attacks on machine unlearning also suffer from practical limitations and require substantial additional knowledge and resources. To fill the gaps in current unlearning attacks, we introduce the Unlearning Usability Attack. This model-agnostic, unlearning-agnostic, and budget-friendly attack distills data distribution information into a small set of benign data. These data are identified as benign by automatic poisoning detection tools due to their positive impact on model training. While benign for machine learning, unlearning these data significantly degrades model information. Our evaluation demonstrates that unlearning this benign data, comprising no more than 1% of the total training data, can reduce model accuracy by up to 50%. Furthermore, our findings show that well-prepared benign data poses challenges for recent unlearning techniques, as erasing these synthetic instances demands higher resources than regular data. These insights underscore the need for future research to reconsider "data poisoning" in the context of machine unlearning.
CRMay 10Code
"Training robust watermarking model may hurt authentication!'' Exploring and Mitigating the Identity Leakage in Robust WatermarkingXinyu Zhang, Ziping Dong, Qingyu Liu et al.
The rapid advancement of generative AI has underscored the critical need for identifying image ownership and protecting copyrights. This makes post-processing image watermarking an essential tool -- it involves embedding a specific watermark message into an image, with successful verification if a similar message can be decoded from the watermarked image. However, this method is susceptible to both adversarial attacks that manipulate the watermarked image to yield an unverified message upon decoding, and the proposed identity leakage-related attacks (e.g., forging watermarked images). The threat of identity leakage is particularly exacerbated in both empirical and certified robust watermarking methods. To defend against the aforementioned attacks, we propose W-IR, the first image watermarking framework that simultaneously incorporates identity protection and robustness. To enhance model robustness, we introduce a novel randomized smoothing technique as part of a robust watermarking, that offers certified robustness against perturbations across two distinct transformation spaces: pixel-level and coordinate-level. Moreover, to further mitigate identity leakage, we propose a new strategy based on residual information loss, aimed at minimizing the mutual information between the residual and watermarked images. Our work strikes a superior balance between robustness and identity leakage mitigation. Extensive experiments demonstrate that our W-IR framework achieves high certified accuracy for authenticity while effectively reducing identity leakage. \footnote{The code is available at https://github.com/holdrain/W-I-R.}
SDNov 10, 2022
Privacy-Utility Balanced Voice De-Identification Using Adversarial ExamplesMeng Chen, Li Lu, Jiadi Yu et al.
Faced with the threat of identity leakage during voice data publishing, users are engaged in a privacy-utility dilemma when enjoying convenient voice services. Existing studies employ direct modification or text-based re-synthesis to de-identify users' voices, but resulting in inconsistent audibility in the presence of human participants. In this paper, we propose a voice de-identification system, which uses adversarial examples to balance the privacy and utility of voice services. Instead of typical additive examples inducing perceivable distortions, we design a novel convolutional adversarial example that modulates perturbations into real-world room impulse responses. Benefit from this, our system could preserve user identity from exposure by Automatic Speaker Identification (ASI) while remaining the voice perceptual quality for non-intrusive de-identification. Moreover, our system learns a compact speaker distribution through a conditional variational auto-encoder to sample diverse target embeddings on demand. Combining diverse target generation and input-specific perturbation construction, our system enables any-to-any identify transformation for adaptive de-identification. Experimental results show that our system could achieve 98% and 79% successful de-identification on mainstream ASIs and commercial systems with an objective Mel cepstral distortion of 4.31dB and a subjective mean opinion score of 4.48.
SESep 11, 2023
Enabling Runtime Verification of Causal Discovery Algorithms with Automated Conditional Independence Reasoning (Extended Version)Pingchuan Ma, Zhenlan Ji, Peisen Yao et al.
Causal discovery is a powerful technique for identifying causal relationships among variables in data. It has been widely used in various applications in software engineering. Causal discovery extensively involves conditional independence (CI) tests. Hence, its output quality highly depends on the performance of CI tests, which can often be unreliable in practice. Moreover, privacy concerns arise when excessive CI tests are performed. Despite the distinct nature between unreliable and excessive CI tests, this paper identifies a unified and principled approach to addressing both of them. Generally, CI statements, the outputs of CI tests, adhere to Pearl's axioms, which are a set of well-established integrity constraints on conditional independence. Hence, we can either detect erroneous CI statements if they violate Pearl's axioms or prune excessive CI statements if they are logically entailed by Pearl's axioms. Holistically, both problems boil down to reasoning about the consistency of CI statements under Pearl's axioms (referred to as CIR problem). We propose a runtime verification tool called CICheck, designed to harden causal discovery algorithms from reliability and privacy perspectives. CICheck employs a sound and decidable encoding scheme that translates CIR into SMT problems. To solve the CIR problem efficiently, CICheck introduces a four-stage decision procedure with three lightweight optimizations that actively prove or refute consistency, and only resort to costly SMT-based reasoning when necessary. Based on the decision procedure to CIR, CICheck includes two variants: ED-CICheck and ED-CICheck, which detect erroneous CI tests (to enhance reliability) and prune excessive CI tests (to enhance privacy), respectively. [abridged due to length limit]
CVMar 4, 2024Code
Exposing the Deception: Uncovering More Forgery Clues for Deepfake DetectionZhongjie Ba, Qingyu Liu, Zhenguang Liu et al.
Deepfake technology has given rise to a spectrum of novel and compelling applications. Unfortunately, the widespread proliferation of high-fidelity fake videos has led to pervasive confusion and deception, shattering our faith that seeing is believing. One aspect that has been overlooked so far is that current deepfake detection approaches may easily fall into the trap of overfitting, focusing only on forgery clues within one or a few local regions. Moreover, existing works heavily rely on neural networks to extract forgery features, lacking theoretical constraints guaranteeing that sufficient forgery clues are extracted and superfluous features are eliminated. These deficiencies culminate in unsatisfactory accuracy and limited generalizability in real-life scenarios. In this paper, we try to tackle these challenges through three designs: (1) We present a novel framework to capture broader forgery clues by extracting multiple non-overlapping local representations and fusing them into a global semantic-rich feature. (2) Based on the information bottleneck theory, we derive Local Information Loss to guarantee the orthogonality of local representations while preserving comprehensive task-relevant information. (3) Further, to fuse the local representations and remove task-irrelevant information, we arrive at a Global Information Loss through the theoretical analysis of mutual information. Empirically, our method achieves state-of-the-art performance on five benchmark datasets.Our code is available at \url{https://github.com/QingyuLiu/Exposing-the-Deception}, hoping to inspire researchers.
CRMay 1
Defense against Poisoning Attacks under Shuffle-DPSiyi Wang, Qiyao Luo, Yihua Hu et al.
Differential Privacy (DP) has become the gold standard for protecting individual privacy in data analytics, and the shuffle-DP model has attracted significant attention from both academia and industry due to its favorable balance between privacy and utility. However, existing shuffle-DP protocols rely on a strong assumption: all users behave honestly. In real-world scenarios, adversarial users can exploit this vulnerability through poisoning attacks, compromising both privacy guarantees and the utility of analytical results. While defending against poisoning attacks in the shuffle-DP model has recently gained interest, existing solutions are limited to frequency estimation tasks. To address this issue, we propose the first general defense framework for all union-preserving queries, capable of transforming any shuffle-DP protocol into a version resilient to poisoning attacks. Beyond robust defense against poisoning attacks, our framework achieves high utility of analytical results. Compared to the original shuffle-DP protocol, it retains asymptotically equivalent error in attack-free settings and incurs only a polylogarithmic increase in error when a constant number of attackers are present. We demonstrate the generality of our framework on several common queries, including summation, frequency estimation, and range counting. Experimental results confirm that our approach effectively defends against poisoning attacks while maintaining strong utility and communication efficiency.
LGApr 18
Channel-Level Semantic Perturbations: Unlearnable Examples for Diverse Training ParadigmsBo Wang, Jia Ni, Mengnan Zhao et al.
The unauthorized use of personal data in model training has emerged as a growing privacy threat. Unlearnable examples (UEs) address this issue by embedding imperceptible perturbations into benign examples to obstruct feature learning. However, existing studies mainly evaluate UEs under from-scratch training settings, leaving their behavior under the widely adopted pretraining-finetuning (PF) paradigm largely unexplored. In this work, we provide the first systematic investigation of unlearnable examples across diverse training paradigms. Our analysis reveals that loading and freezing pretrained weights significantly weakens the effectiveness of existing UEs methods. We further explain these findings through semantic filtering: while UEs tend to induce models to overfit non-semantic noise, thereby weakening their semantic extraction capabilities, under the PF paradigm, frozen shallow layers preserve data semantics, effectively filtering out distracting information like unlearnable noise. Guided by these insights, we propose a hierarchical deception strategy, Shallow Semantic Camouflage (SSC), that confines the generation process to a semantically valid subspace, aiming to bypass the semantic suppression introduced by pretrained weights. Extensive experiments demonstrate that our method consistently preserves data unlearnability even under challenging training paradigms, such as shallow-layer freezing and semantic-focused pretraining (SF-Pretrain), bridging the critical gap in pretrain-based unlearnable learning.
CVMar 10
When Detectors Forget Forensics: Blocking Semantic Shortcuts for Generalizable AI-Generated Image DetectionChao Shuai, Zhenguang Liu, Shaojing Fan et al.
AI-generated image detection has become increasingly important with the rapid advancement of generative AI. However, detectors built on Vision Foundation Models (VFMs, \emph{e.g.}, CLIP) often struggle to generalize to images created using unseen generation pipelines. We identify, for the first time, a key failure mechanism, termed \emph{semantic fallback}, where VFM-based detectors rely on dominant pre-trained semantic priors (such as identity) rather than forgery-specific traces under distribution shifts. To address this issue, we propose \textbf{Geometric Semantic Decoupling (GSD)}, a parameter-free module that explicitly removes semantic components from learned representations by leveraging a frozen VFM as a semantic guide with a trainable VFM as an artifact detector. GSD estimates semantic directions from batch-wise statistics and projects them out via a geometric constraint, forcing the artifact detector to rely on semantic-invariant forensic evidence. Extensive experiments demonstrate that our method consistently outperforms state-of-the-art approaches, achieving 94.4\% video-level AUC (+\textbf{1.2\%}) in cross-dataset evaluation, improving robustness to unseen manipulations (+\textbf{3.0\%} on DF40), and generalizing beyond faces to the detection of synthetic images of general scenes, including UniversalFakeDetect (+\textbf{0.9\%}) and GenImage (+\textbf{1.7\%}).
CLFeb 16
Explainable Token-level Noise Filtering for LLM Fine-tuning DatasetsYuchen Yang, Wenze Lin, Enhao Huang et al.
Large Language Models (LLMs) have seen remarkable advancements, achieving state-of-the-art results in diverse applications. Fine-tuning, an important step for adapting LLMs to specific downstream tasks, typically involves further training on corresponding datasets. However, a fundamental discrepancy exists between current fine-tuning datasets and the token-level optimization mechanism of LLMs: most datasets are designed at the sentence-level, which introduces token-level noise, causing negative influence to final performance. In this paper, we propose XTF, an explainable token-level noise filtering framework. XTF decomposes the complex and subtle contributions of token-level data to the fine-tuning process into three distinct and explicit attributes (reasoning importance, knowledge novelty, and task relevance), which can be assessed using scoring methods, and then masks the gradients of selected noisy tokens accordingly to optimize the performance of fine-tuned LLMs. We conduct extensive experiments on three representative downstream tasks (math, code and medicine) across 7 mainstream LLMs. The results demonstrate that XTF can significantly improve downstream performance by up to 13.7% compared to regular fine-tuning. Our work highlights the importance of token-level dataset optimization, and demonstrates the potential of strategies based on attribute decomposition for explaining complex training mechanisms.
NAOct 17, 2022
A Model-Consistent Data-Driven Computational Strategy for PDE Joint Inversion ProblemsKui Ren, Lu Zhang
The task of simultaneously reconstructing multiple physical coefficients in partial differential equations (PDEs) from observed data is ubiquitous in applications. In this work, we propose an integrated data-driven and model-based iterative reconstruction framework for such joint inversion problems where additional data on the unknown coefficients are supplemented for better reconstructions. Our method couples the supplementary data with the PDE model to make the data-driven modeling process consistent with the model-based reconstruction procedure. We characterize the impact of learning uncertainty on the joint inversion results for two typical inverse problems. Numerical evidence is provided to demonstrate the feasibility of using data-driven models to improve the joint inversion of multiple coefficients in PDEs.
LGJul 17, 2022
Task-aware Similarity Learning for Event-triggered Time SeriesShaoyu Dou, Kai Yang, Yang Jiao et al.
Time series analysis has achieved great success in diverse applications such as network security, environmental monitoring, and medical informatics. Learning similarities among different time series is a crucial problem since it serves as the foundation for downstream analysis such as clustering and anomaly detection. It often remains unclear what kind of distance metric is suitable for similarity learning due to the complex temporal dynamics of the time series generated from event-triggered sensing, which is common in diverse applications, including automated driving, interactive healthcare, and smart home automation. The overarching goal of this paper is to develop an unsupervised learning framework that is capable of learning task-aware similarities among unlabeled event-triggered time series. From the machine learning vantage point, the proposed framework harnesses the power of both hierarchical multi-scale sequence autoencoders and Gaussian Mixture Model (GMM) to effectively learn the low-dimensional representations from the time series. Finally, the obtained similarity measure can be easily visualized for explaining. The proposed framework aspires to offer a stepping stone that gives rise to a systematic approach to model and learn similarities among a multitude of event-triggered time series. Through extensive qualitative and quantitative experiments, it is revealed that the proposed method outperforms state-of-the-art methods considerably.
SDFeb 5
HyperPotter: Spell the Charm of High-Order Interactions in Audio Deepfake DetectionQing Wen, Haohao Li, Zhongjie Ba et al.
Advances in AIGC technologies have enabled the synthesis of highly realistic audio deepfakes capable of deceiving human auditory perception. Although numerous audio deepfake detection (ADD) methods have been developed, most rely on local temporal/spectral features or pairwise relations, overlooking high-order interactions (HOIs). HOIs capture discriminative patterns that emerge from multiple feature components beyond their individual contributions. We propose HyperPotter, a hypergraph-based framework that explicitly models these synergistic HOIs through clustering-based hyperedges with class-aware prototype initialization. Extensive experiments demonstrate that HyperPotter surpasses its baseline by an average relative gain of 22.15% across 11 datasets and outperforms state-of-the-art methods by 13.96% on 4 challenging cross-domain datasets, demonstrating superior generalization to diverse attacks and speakers.
CRMay 23, 2024Code
S-Eval: Towards Automated and Comprehensive Safety Evaluation for Large Language ModelsXiaohan Yuan, Jinfeng Li, Dongxia Wang et al.
Generative large language models (LLMs) have revolutionized natural language processing with their transformative and emergent capabilities. However, recent evidence indicates that LLMs can produce harmful content that violates social norms, raising significant concerns regarding the safety and ethical ramifications of deploying these advanced models. Thus, it is both critical and imperative to perform a rigorous and comprehensive safety evaluation of LLMs before deployment. Despite this need, owing to the extensiveness of LLM generation space, it still lacks a unified and standardized risk taxonomy to systematically reflect the LLM content safety, as well as automated safety assessment techniques to explore the potential risk efficiently. To bridge the striking gap, we propose S-Eval, a novel LLM-based automated Safety Evaluation framework with a newly defined comprehensive risk taxonomy. S-Eval incorporates two key components, i.e., an expert testing LLM ${M}_t$ and a novel safety critique LLM ${M}_c$. ${M}_t$ is responsible for automatically generating test cases in accordance with the proposed risk taxonomy. ${M}_c$ can provide quantitative and explainable safety evaluations for better risk awareness of LLMs. In contrast to prior works, S-Eval is efficient and effective in test generation and safety evaluation. Moreover, S-Eval can be flexibly configured and adapted to the rapid evolution of LLMs and accompanying new safety threats, test generation methods and safety critique methods thanks to the LLM-based architecture. S-Eval has been deployed in our industrial partner for the automated safety evaluation of multiple LLMs serving millions of users, demonstrating its effectiveness in real-world scenarios. Our benchmark is publicly available at https://github.com/IS2Lab/S-Eval.
CLApr 30Code
APPSI-139: A Parallel Corpus of English Application Privacy Policy Summarization and InterpretationPengyun Zhu, Qiheng Sun, Long Wen et al.
Privacy policies are essential for users to understand how service providers handle their personal data. However, these documents are often long and complex, as well as filled with technobabble and legalese, causing users to unknowingly accept terms that may even contradict the law. While summarizing and interpreting these privacy policies is crucial, there is a lack of high-quality English parallel corpus optimized for legal clarity and readability. To address this issue, we introduce APPSI-139, a high-quality English privacy policy corpus meticulously annotated by domain experts, specifically designed for summarization and interpretation tasks. The corpus includes 139 English privacy policies, 15,692 rewritten parallel corpora, and 36,351 fine-grained annotation labels across 11 data practice categories. Concurrently, we propose TCSI-pp-V2, a hybrid privacy policy summarization and interpretation framework that employs an alternating training strategy and coordinates multiple expert modules to effectively balance computational efficiency and accuracy. Experimental results show that the hybrid summarization system built on APPSI-139 corpus and the TCSI-pp-V2 framework outperform large language models, such as GPT-4o and LLaMA-3-70B, in terms of readability and reliability. The source code and dataset are available at https://github.com/EnlightenedAI/APPSI-139.
DBMar 30
QCFuse: Query-Centric Cache Fusion for Efficient RAG InferenceJianxin Yan, Zeheng Qian, Wangze Ni et al.
Cache fusion accelerates generation process of LLMs equipped with RAG through KV caching and selective token recomputation, thereby reducing computational costs and improving efficiency. However, existing methods primarily rely on local perspectives for token selection and lack global awareness from the user query. Utilizing this global awareness is challenging due to the high cost of obtaining context-aware query representations and the strict pipeline constraints required for efficient attention analysis. Thus, this demonstration introduces QCFuse, an innovative KV cache fusion system centered on the user query. QCFuse leverages semantic summary anchors to enhance query representations and selectively recomputes query-related tokens to improve accuracy, updating tokens based on the attention distribution of the most critical Transformer layer to preserve the high efficiency of the pipeline structure. Evaluations on real-world datasets demonstrate that QCFuse significantly improves the response efficiency of LLMs by 40\% while maintaining equivalent accuracy compared to current methods. Additionally, in certain scenarios, QCFuse achieves an attention denoising effect that yields higher response accuracy, demonstrating substantial potential in the optimization of LLM inference.