CLApr 29, 2024
Evaluating and Mitigating Linguistic Discrimination in Large Language ModelsGuoliang Dong, Haoyu Wang, Jun Sun et al.
By training on text in various languages, large language models (LLMs) typically possess multilingual support and demonstrate remarkable capabilities in solving tasks described in different languages. However, LLMs can exhibit linguistic discrimination due to the uneven distribution of training data across languages. That is, LLMs are hard to keep the consistency of responses when faced with the same task but depicted in different languages. In this study, we first explore the consistency in the LLMs' outputs responding to queries in various languages from two aspects: safety and quality. We conduct this analysis with two datasets (AdvBench and NQ) based on four LLMs (Llama2-13b, Gemma-7b, GPT-3.5-turbo and Gemini-pro). The results show that LLMs exhibit stronger human alignment capabilities with queries in English, French, Russian, and Spanish (only 1.04\% of harmful queries successfully jailbreak on average) compared to queries in Bengali, Georgian, Nepali and Maithili (27.7\% of harmful queries jailbreak successfully on average). Moreover, for queries in English, Danish, Czech and Slovenian, LLMs tend to produce responses with a higher quality (with 0.1494 $F_1$ score on average) compared to the other languages. Upon these findings, we propose LDFighter, a similarity-based voting, to mitigate the linguistic discrimination in LLMs. LDFighter ensures consistent service for different language speakers. We evaluate LDFighter with both benign queries and harmful queries. The results show that LDFighter not only significantly reduces the jailbreak success rate but also improve the response quality on average, demonstrating its effectiveness.
CLJun 3, 2024
CodeR: Issue Resolving with Multi-Agent and Task GraphsDong Chen, Shaoxin Lin, Muhan Zeng et al.
GitHub issue resolving recently has attracted significant attention from academia and industry. SWE-bench is proposed to measure the performance in resolving issues. In this paper, we propose CodeR, which adopts a multi-agent framework and pre-defined task graphs to Repair & Resolve reported bugs and add new features within code Repository. On SWE-bench lite, CodeR is able to solve 28.33% of issues, when submitting only once for each issue. We examine the performance impact of each design of CodeR and offer insights to advance this research direction.
CLDec 29, 2021
Repairing Adversarial Texts through PerturbationGuoliang Dong, Jingyi Wang, Jun Sun et al.
It is known that neural networks are subject to attacks through adversarial perturbations, i.e., inputs which are maliciously crafted through perturbations to induce wrong predictions. Furthermore, such attacks are impossible to eliminate, i.e., the adversarial perturbation is still possible after applying mitigation methods such as adversarial training. Multiple approaches have been developed to detect and reject such adversarial inputs, mostly in the image domain. Rejecting suspicious inputs however may not be always feasible or ideal. First, normal inputs may be rejected due to false alarms generated by the detection algorithm. Second, denial-of-service attacks may be conducted by feeding such systems with adversarial inputs. To address the gap, in this work, we propose an approach to automatically repair adversarial texts at runtime. Given a text which is suspected to be adversarial, we novelly apply multiple adversarial perturbation methods in a positive way to identify a repair, i.e., a slightly mutated but semantically equivalent text that the neural network correctly classifies. Our approach has been experimented with multiple models trained for natural language processing tasks and the results show that our approach is effective, i.e., it successfully repairs about 80\% of the adversarial texts. Furthermore, depending on the applied perturbation method, an adversarial text could be repaired in as short as one second on average.
LGJul 17, 2021
Automatic Fairness Testing of Neural Classifiers through Adversarial SamplingPeixin Zhang, Jingyi Wang, Jun Sun et al.
Although deep learning has demonstrated astonishing performance in many applications, there are still concerns about its dependability. One desirable property of deep learning applications with societal impact is fairness (i.e., non-discrimination). Unfortunately, discrimination might be intrinsically embedded into the models due to the discrimination in the training data. As a countermeasure, fairness testing systemically identifies discriminatory samples, which can be used to retrain the model and improve the model's fairness. Existing fairness testing approaches however have two major limitations. Firstly, they only work well on traditional machine learning models and have poor performance (e.g., effectiveness and efficiency) on deep learning models. Secondly, they only work on simple structured (e.g., tabular) data and are not applicable for domains such as text. In this work, we bridge the gap by proposing a scalable and effective approach for systematically searching for discriminatory samples while extending existing fairness testing approaches to address a more challenging domain, i.e., text classification. Compared with state-of-the-art methods, our approach only employs lightweight procedures like gradient computation and clustering, which is significantly more scalable and effective. Experimental results show that on average, our approach explores the search space much more effectively (9.62 and 2.38 times more than the state-of-the-art methods respectively on tabular and text datasets) and generates much more discriminatory samples (24.95 and 2.68 times) within a same reasonable time. Moreover, the retrained models reduce discrimination by 57.2% and 60.2% respectively on average.
LGDec 3, 2020
Towards Repairing Neural Networks CorrectlyGuoliang Dong, Jun Sun, Jingyi Wang et al.
Neural networks are increasingly applied to support decision making in safety-critical applications (like autonomous cars, unmanned aerial vehicles and face recognition based authentication). While many impressive static verification techniques have been proposed to tackle the correctness problem of neural networks, it is possible that static verification may never be sufficiently scalable to handle real-world neural networks. In this work, we propose a runtime verification method to ensure the correctness of neural networks. Given a neural network and a desirable safety property, we adopt state-of-the-art static verification techniques to identify strategically locations to introduce additional gates which "correct" neural network behaviors at runtime. Experiment results show that our approach effectively generates neural networks which are guaranteed to satisfy the properties, whilst being consistent with the original neural network most of the time.
LGSep 22, 2019
Towards Interpreting Recurrent Neural Networks through Probabilistic AbstractionGuoliang Dong, Jingyi Wang, Jun Sun et al.
Neural networks are becoming a popular tool for solving many real-world problems such as object recognition and machine translation, thanks to its exceptional performance as an end-to-end solution. However, neural networks are complex black-box models, which hinders humans from interpreting and consequently trusting them in making critical decisions. Towards interpreting neural networks, several approaches have been proposed to extract simple deterministic models from neural networks. The results are not encouraging (e.g., low accuracy and limited scalability), fundamentally due to the limited expressiveness of such simple models. In this work, we propose an approach to extract probabilistic automata for interpreting an important class of neural networks, i.e., recurrent neural networks. Our work distinguishes itself from existing approaches in two important ways. One is that probability is used to compensate for the loss of expressiveness. This is inspired by the observation that human reasoning is often `probabilistic'. The other is that we adaptively identify the right level of abstraction so that a simple model is extracted in a request-specific way. We conduct experiments on several real-world datasets using state-of-the-art architectures including GRU and LSTM. The result shows that our approach significantly improves existing approaches in terms of accuracy or scalability. Lastly, we demonstrate the usefulness of the extracted models through detecting adversarial texts.
LGDec 14, 2018
Adversarial Sample Detection for Deep Neural Network through Model Mutation TestingJingyi Wang, Guoliang Dong, Jun Sun et al.
Deep neural networks (DNN) have been shown to be useful in a wide range of applications. However, they are also known to be vulnerable to adversarial samples. By transforming a normal sample with some carefully crafted human imperceptible perturbations, even highly accurate DNN make wrong decisions. Multiple defense mechanisms have been proposed which aim to hinder the generation of such adversarial samples. However, a recent work show that most of them are ineffective. In this work, we propose an alternative approach to detect adversarial samples at runtime. Our main observation is that adversarial samples are much more sensitive than normal samples if we impose random mutations on the DNN. We thus first propose a measure of `sensitivity' and show empirically that normal samples and adversarial samples have distinguishable sensitivity. We then integrate statistical hypothesis testing and model mutation testing to check whether an input sample is likely to be normal or adversarial at runtime by measuring its sensitivity. We evaluated our approach on the MNIST and CIFAR10 datasets. The results show that our approach detects adversarial samples generated by state-of-the-art attacking methods efficiently and accurately.