Yujie Gu

h-index29

5papers

15citations

Novelty52%

AI Score28

Ranked #150,189 of 194,257 authors (top 77%)#3,989 in CR (top 59%)

5 Papers

10.7AIMar 29, 2024Code

The Impact of Prompts on Zero-Shot Detection of AI-Generated Text

Kaito Taguchi, Yujie Gu, Kouichi Sakurai

In recent years, there have been significant advancements in the development of Large Language Models (LLMs). While their practical applications are now widespread, their potential for misuse, such as generating fake news and committing plagiarism, has posed significant concerns. To address this issue, detectors have been developed to evaluate whether a given text is human-generated or AI-generated. Among others, zero-shot detectors stand out as effective approaches that do not require additional training data and are often likelihood-based. In chat-based applications, users commonly input prompts and utilize the AI-generated texts. However, zero-shot detectors typically analyze these texts in isolation, neglecting the impact of the original prompts. It is conceivable that this approach may lead to a discrepancy in likelihood assessments between the text generation phase and the detection phase. So far, there remains an unverified gap concerning how the presence or absence of prompts impacts detection accuracy for zero-shot detectors. In this paper, we introduce an evaluative framework to empirically analyze the impact of prompts on the detection accuracy of AI-generated text. We assess various zero-shot detectors using both white-box detection, which leverages the prompt, and black-box detection, which operates without prompt information. Our experiments reveal the significant influence of prompts on detection accuracy. Remarkably, compared with black-box detection without prompts, the white-box methods using prompts demonstrate an increase in AUC of at least $0.1$ across all zero-shot detectors tested. Code is available: \url{https://github.com/kaito25atugich/Detector}.

4.2CRSep 18, 2024

PAD-FT: A Lightweight Defense for Backdoor Attacks via Data Purification and Fine-Tuning

Yukai Xu, Yujie Gu, Kouichi Sakurai

Backdoor attacks pose a significant threat to deep neural networks, particularly as recent advancements have led to increasingly subtle implantation, making the defense more challenging. Existing defense mechanisms typically rely on an additional clean dataset as a standard reference and involve retraining an auxiliary model or fine-tuning the entire victim model. However, these approaches are often computationally expensive and not always feasible in practical applications. In this paper, we propose a novel and lightweight defense mechanism, termed PAD-FT, that does not require an additional clean dataset and fine-tunes only a very small part of the model to disinfect the victim model. To achieve this, our approach first introduces a simple data purification process to identify and select the most-likely clean data from the poisoned training dataset. The self-purified clean dataset is then used for activation clipping and fine-tuning only the last classification layer of the victim model. By integrating data purification, activation clipping, and classifier fine-tuning, our mechanism PAD-FT demonstrates superior effectiveness across multiple backdoor attack methods and datasets, as confirmed through extensive experimental evaluation.

4.2CRSep 16, 2024

FreeMark: A Non-Invasive White-Box Watermarking for Deep Neural Networks

Yuzhang Chen, Jiangnan Zhu, Yujie Gu et al.

Deep neural networks (DNNs) have achieved significant success in real-world applications. However, safeguarding their intellectual property (IP) remains extremely challenging. Existing DNN watermarking for IP protection often require modifying DNN models, which reduces model performance and limits their practicality. This paper introduces FreeMark, a novel DNN watermarking framework that leverages cryptographic principles without altering the original host DNN model, thereby avoiding any reduction in model performance. Unlike traditional DNN watermarking methods, FreeMark innovatively generates secret keys from a pre-generated watermark vector and the host model using gradient descent. These secret keys, used to extract watermark from the model's activation values, are securely stored with a trusted third party, enabling reliable watermark extraction from suspect models. Extensive experiments demonstrate that FreeMark effectively resists various watermark removal attacks while maintaining high watermark capacity.

6.4LGFeb 16, 2024

TernaryVote: Differentially Private, Communication Efficient, and Byzantine Resilient Distributed Optimization on Heterogeneous Data

Richeng Jin, Yujie Gu, Kai Yue et al.

Distributed training of deep neural networks faces three critical challenges: privacy preservation, communication efficiency, and robustness to fault and adversarial behaviors. Although significant research efforts have been devoted to addressing these challenges independently, their synthesis remains less explored. In this paper, we propose TernaryVote, which combines a ternary compressor and the majority vote mechanism to realize differential privacy, gradient compression, and Byzantine resilience simultaneously. We theoretically quantify the privacy guarantee through the lens of the emerging f-differential privacy (DP) and the Byzantine resilience of the proposed algorithm. Particularly, in terms of privacy guarantees, compared to the existing sign-based approach StoSign, the proposed method improves the dimension dependence on the gradient size and enjoys privacy amplification by mini-batch sampling while ensuring a comparable convergence rate. We also prove that TernaryVote is robust when less than 50% of workers are blind attackers, which matches that of SIGNSGD with majority vote. Extensive experimental results validate the effectiveness of the proposed algorithm.

1.2ITFeb 21, 2022

On the Information-theoretic Security of Combinatorial All-or-nothing Transforms

Yujie Gu, Sonata Akao, Navid Nasr Esfahani et al.

All-or-nothing transforms (AONT) were proposed by Rivest as a message preprocessing technique for encrypting data to protect against brute-force attacks, and have numerous applications in cryptography and information security. Later the unconditionally secure AONT and their combinatorial characterization were introduced by Stinson. Informally, a combinatorial AONT is an array with the unbiased requirements and its security properties in general depend on the prior probability distribution on the inputs $s$-tuples. Recently, it was shown by Esfahani and Stinson that a combinatorial AONT has perfect security provided that all the inputs $s$-tuples are equiprobable, and has weak security provided that all the inputs $s$-tuples are with non-zero probability. This paper aims to explore on the gap between perfect security and weak security for combinatorial $(t,s,v)$-AONTs. Concretely, we consider the typical scenario that all the $s$ inputs take values independently (but not necessarily identically) and quantify the amount of information $H(\mathcal{X}|\mathcal{Y})$ about any $t$ inputs $\mathcal{X}$ that is not revealed by any $s-t$ outputs $\mathcal{Y}$. In particular, we establish the general lower and upper bounds on $H(\mathcal{X}|\mathcal{Y})$ for combinatorial AONTs using information-theoretic techniques, and also show that the derived bounds can be attained in certain cases. Furthermore, the discussions are extended for the security properties of combinatorial asymmetric AONTs.