Pei Huang

CL
h-index16
10papers
264citations
Novelty41%
AI Score37

10 Papers

CLMar 21, 2022
A Prompting-based Approach for Adversarial Example Generation and Robustness Enhancement

Yuting Yang, Pei Huang, Juan Cao et al.

Recent years have seen the wide application of NLP models in crucial areas such as finance, medical treatment, and news media, raising concerns of the model robustness and vulnerabilities. In this paper, we propose a novel prompt-based adversarial attack to compromise NLP models and robustness enhancement technique. We first construct malicious prompts for each instance and generate adversarial examples via mask-and-filling under the effect of a malicious purpose. Our attack technique targets the inherent vulnerabilities of NLP models, allowing us to generate samples even without interacting with the victim NLP model, as long as it is based on pre-trained language models (PLMs). Furthermore, we design a prompt-based adversarial training method to improve the robustness of PLMs. As our training method does not actually generate adversarial samples, it can be applied to large-scale training sets efficiently. The experimental results show that our attack method can achieve a high attack success rate with more diverse, fluent and natural adversarial examples. In addition, our robustness enhancement method can significantly improve the robustness of models to resist adversarial attacks. Our work indicates that prompting paradigm has great potential in probing some fundamental flaws of PLMs and fine-tuning them for downstream tasks.

CRFeb 4, 2024
Copyright Protection in Generative AI: A Technical Perspective

Jie Ren, Han Xu, Pengfei He et al.

Generative AI has witnessed rapid advancement in recent years, expanding their capabilities to create synthesized content such as text, images, audio, and code. The high fidelity and authenticity of contents generated by these Deep Generative Models (DGMs) have sparked significant copyright concerns. There have been various legal debates on how to effectively safeguard copyrights in DGMs. This work delves into this issue by providing a comprehensive overview of copyright protection from a technical perspective. We examine from two distinct viewpoints: the copyrights pertaining to the source data held by the data owners and those of the generative models maintained by the model builders. For data copyright, we delve into methods data owners can protect their content and DGMs can be utilized without infringing upon these rights. For model copyright, our discussion extends to strategies for preventing model theft and identifying outputs generated by specific models. Finally, we highlight the limitations of existing techniques and identify areas that remain unexplored. Furthermore, we discuss prospective directions for the future of copyright protection, underscoring its importance for the sustainable and ethical development of Generative AI.

LGDec 20, 2023
Towards Efficient Verification of Quantized Neural Networks

Pei Huang, Haoze Wu, Yuting Yang et al.

Quantization replaces floating point arithmetic with integer arithmetic in deep neural network models, providing more efficient on-device inference with less power and memory. In this work, we propose a framework for formally verifying properties of quantized neural networks. Our baseline technique is based on integer linear programming which guarantees both soundness and completeness. We then show how efficiency can be improved by utilizing gradient-based heuristic search methods and also bound-propagation techniques. We evaluate our approach on perception networks quantized with PyTorch. Our results show that we can verify quantized networks with better scalability and efficiency than the previous state of the art.

CVDec 15, 2025
The Renaissance of Expert Systems: Optical Recognition of Printed Chinese Jianpu Musical Scores with Lyrics

Fan Bu, Rongfeng Li, Zijin Li et al.

Large-scale optical music recognition (OMR) research has focused mainly on Western staff notation, leaving Chinese Jianpu (numbered notation) and its rich lyric resources underexplored. We present a modular expert-system pipeline that converts printed Jianpu scores with lyrics into machine-readable MusicXML and MIDI, without requiring massive annotated training data. Our approach adopts a top-down expert-system design, leveraging traditional computer-vision techniques (e.g., phrase correlation, skeleton analysis) to capitalize on prior knowledge, while integrating unsupervised deep-learning modules for image feature embeddings. This hybrid strategy strikes a balance between interpretability and accuracy. Evaluated on The Anthology of Chinese Folk Songs, our system massively digitizes (i) a melody-only collection of more than 5,000 songs (> 300,000 notes) and (ii) a curated subset with lyrics comprising over 1,400 songs (> 100,000 notes). The system achieves high-precision recognition on both melody (note-wise F1 = 0.951) and aligned lyrics (character-wise F1 = 0.931).

AIJan 25, 2024
Marabou 2.0: A Versatile Formal Analyzer of Neural Networks

Haoze Wu, Omri Isac, Aleksandar Zeljić et al.

This paper serves as a comprehensive system description of version 2.0 of the Marabou framework for formal analysis of neural networks. We discuss the tool's architectural design and highlight the major features and components introduced since its initial release.

CLJan 15, 2022
A Dual Prompt Learning Framework for Few-Shot Dialogue State Tracking

Yuting Yang, Wenqiang Lei, Pei Huang et al.

Dialogue state tracking (DST) module is an important component for task-oriented dialog systems to understand users' goals and needs. Collecting dialogue state labels including slots and values can be costly, especially with the wide application of dialogue systems in more and more new-rising domains. In this paper, we focus on how to utilize the language understanding and generation ability of pre-trained language models for DST. We design a dual prompt learning framework for few-shot DST. Specifically, we consider the learning of slot generation and value generation as dual tasks, and two prompts are designed based on such a dual structure to incorporate task-related knowledge of these two tasks respectively. In this way, the DST task can be formulated as a language modeling task efficiently under few-shot settings. Experimental results on two task-oriented dialogue datasets show that the proposed method not only outperforms existing state-of-the-art few-shot methods, but also can generate unseen slots. It indicates that DST-related knowledge can be probed from PLM and utilized to address low-resource DST efficiently with the help of prompt learning.

CLJan 11, 2022
Quantifying Robustness to Adversarial Word Substitutions

Yuting Yang, Pei Huang, FeiFei Ma et al.

Deep-learning-based NLP models are found to be vulnerable to word substitution perturbations. Before they are widely adopted, the fundamental issues of robustness need to be addressed. Along this line, we propose a formal framework to evaluate word-level robustness. First, to study safe regions for a model, we introduce robustness radius which is the boundary where the model can resist any perturbation. As calculating the maximum robustness radius is computationally hard, we estimate its upper and lower bound. We repurpose attack methods as ways of seeking upper bound and design a pseudo-dynamic programming algorithm for a tighter upper bound. Then verification method is utilized for a lower bound. Further, for evaluating the robustness of regions outside a safe radius, we reexamine robustness from another view: quantification. A robustness metric with a rigorous statistical guarantee is introduced to measure the quantification of adversarial examples, which indicates the model's susceptibility to perturbations outside the safe radius. The metric helps us figure out why state-of-the-art models like BERT can be easily fooled by a few word substitutions, but generalize well in the presence of real-world noises.

AINov 15, 2021
Can Graph Neural Networks Learn to Solve MaxSAT Problem?

Minghao Liu, Fuqi Jia, Pei Huang et al.

With the rapid development of deep learning techniques, various recent work has tried to apply graph neural networks (GNNs) to solve NP-hard problems such as Boolean Satisfiability (SAT), which shows the potential in bridging the gap between machine learning and symbolic reasoning. However, the quality of solutions predicted by GNNs has not been well investigated in the literature. In this paper, we study the capability of GNNs in learning to solve Maximum Satisfiability (MaxSAT) problem, both from theoretical and practical perspectives. We build two kinds of GNN models to learn the solution of MaxSAT instances from benchmarks, and show that GNNs have attractive potential to solve MaxSAT problem through experimental evaluation. We also present a theoretical explanation of the effect that GNNs can learn to solve MaxSAT problem to some extent for the first time, based on the algorithmic alignment theory.

LGOct 29, 2021
ε-weakened Robustness of Deep Neural Networks

Pei Huang, Yuting Yang, Minghao Liu et al.

This paper introduces a notation of $\varepsilon$-weakened robustness for analyzing the reliability and stability of deep neural networks (DNNs). Unlike the conventional robustness, which focuses on the "perfect" safe region in the absence of adversarial examples, $\varepsilon$-weakened robustness focuses on the region where the proportion of adversarial examples is bounded by user-specified $\varepsilon$. Smaller $\varepsilon$ means a smaller chance of failure. Under such robustness definition, we can give conclusive results for the regions where conventional robustness ignores. We prove that the $\varepsilon$-weakened robustness decision problem is PP-complete and give a statistical decision algorithm with user-controllable error bound. Furthermore, we derive an algorithm to find the maximum $\varepsilon$-weakened robustness radius. The time complexity of our algorithms is polynomial in the dimension and size of the network. So, they are scalable to large real-world networks. Besides, We also show its potential application in analyzing quality issues.

ROApr 15, 2017
Securing a UAV Using Individual Characteristics From an EEG Signal

Ashutosh Singandhupe, Hung Manh La, David Feil-Seifer et al.

Unmanned aerial vehicles (UAVs) have gained much attention in recent years for both commercial and military applications. The progress in this field has gained much popularity and the research has encompassed various fields of scientific domain. Cyber securing a UAV communication has been one of the active research field since the attack on Predator UAV video stream hijacking in 2009. Since UAVs rely heavily on on-board autopilot to function, it is important to develop an autopilot system that is robust to possible cyber attacks. In this work, we present a biometric system to encrypt the UAV communication by generating a key which is derived from Beta component of the EEG signal of a user. We have developed a safety mechanism that would be activated in case the communication of the UAV from the ground control station gets attacked. This system has been validated on a commercial UAV under malicious attack conditions during which we implement a procedure where the UAV return safely to a "home" position.