CROct 11, 2024Code
Can a large language model be a gaslighter?Wei Li, Luyao Zhu, Yang Song et al.
Large language models (LLMs) have gained human trust due to their capabilities and helpfulness. However, this in turn may allow LLMs to affect users' mindsets by manipulating language. It is termed as gaslighting, a psychological effect. In this work, we aim to investigate the vulnerability of LLMs under prompt-based and fine-tuning-based gaslighting attacks. Therefore, we propose a two-stage framework DeepCoG designed to: 1) elicit gaslighting plans from LLMs with the proposed DeepGaslighting prompting template, and 2) acquire gaslighting conversations from LLMs through our Chain-of-Gaslighting method. The gaslighting conversation dataset along with a corresponding safe dataset is applied to fine-tuning-based attacks on open-source LLMs and anti-gaslighting safety alignment on these LLMs. Experiments demonstrate that both prompt-based and fine-tuning-based attacks transform three open-source LLMs into gaslighters. In contrast, we advanced three safety alignment strategies to strengthen (by 12.05%) the safety guardrail of LLMs. Our safety alignment strategies have minimal impacts on the utility of LLMs. Empirical studies indicate that an LLM may be a potential gaslighter, even if it passed the harmfulness test on general dangerous queries.
CLMar 7, 2025Code
Ensemble Debiasing Across Class and Sample Levels for Fairer Prompting AccuracyRuixi Lin, Ziqiao Wang, Yang You
Language models are strong few-shot learners and achieve good overall accuracy in text classification tasks, masking the fact that their results suffer from great class accuracy imbalance. We believe that the pursuit of overall accuracy should not come from enriching the strong classes, but from raising up the weak ones. To address the imbalance, we propose a Heaviside step function based ensemble debiasing method, which enables flexible rectifications of in-context learned class probabilities at both class and sample levels. Evaluations with Llama-2-13B on seven text classification benchmarks show that our approach achieves state-of-the-art overall accuracy gains with balanced class accuracies. More importantly, we perform analyses on the resulted probability correction scheme, showing that sample-level corrections are necessary to elevate weak classes. Due to effectively correcting weak classes, our method also brings significant performance gains to a larger model variant, Llama-2-70B, especially on a biomedical domain task, further demonstrating the necessity of ensemble debiasing at both levels. Our source code is available at https://github.com/NUS-HPC-AI-Lab/DCS.
ASJan 14, 2020Code
Improved Robust ASR for Social Robots in Public SpacesCharles Jankowski, Vishwas Mruthyunjaya, Ruixi Lin
Social robots deployed in public spaces present a challenging task for ASR because of a variety of factors, including noise SNR of 20 to 5 dB. Existing ASR models perform well for higher SNRs in this range, but degrade considerably with more noise. This work explores methods for providing improved ASR performance in such conditions. We use the AiShell-1 Chinese speech corpus and the Kaldi ASR toolkit for evaluations. We were able to exceed state-of-the-art ASR performance with SNR lower than 20 dB, demonstrating the feasibility of achieving relatively high performing ASR with open-source toolkits and hundreds of hours of training data, which is commonly available.
LGMar 7
Discovering the Hidden Role of Gini Index In Prompt-based ClassificationRuixi Lin
In classification tasks, the long-tailed minority classes usually offer the predictions that are most important. Yet these classes consistently exhibit low accuracies, whereas a few high-performing classes dominate the game. We pursue a foundational understanding of the hidden role of Gini Index as a tool for detecting and optimizing (debiasing) disparities in class accuracy, focusing on the case of prompt-based classification. We introduce the intuitions, benchmark Gini scores in real-world LLMs and vision models, and thoroughly discuss the insights of Gini not only as a measure of relative accuracy dominance but also as a direct optimization metric. Through rigorous case analyses, we first show that weak to strong relative accuracy imbalance exists in both prompt-based, text and image classification results and regardless of whether the classification is high-dimensional or low-dimensional. Then, we harness the Gini metric to propose a post-hoc model-agnostic bias mitigation method. Experimental results across few-shot news, biomedical, and zero-shot image classification show that our method significantly reduces both relative and absolute accuracy imbalances, minimizing top class relative dominance while elevating weakest classes.
CLDec 26, 2024
Let the Fuzzy Rule Speak: Enhancing In-context Learning Debiasing with InterpretabilityRuixi Lin, Yang You
Large language models (LLMs) often struggle with balanced class accuracy in text classification tasks using in-context learning (ICL), hindering some practical uses due to user dissatisfaction or safety risks caused by misclassifications. Retraining LLMs to address root causes in data or model priors is neither easy nor cost-effective. This paper delves deeper into the class accuracy imbalance issue, identifying that it arises because certain classes consistently receive disproportionately high ICL probabilities, causing under-prediction and lower accuracy for others. More importantly, probability ranges affect the imbalance differently, allowing for precise, range-specific corrections. We introduce FuRud (Fuzzy Rule Optimization-based Debiasing), a method for sample-level class probability correction. FuRud tackles interpretability challenges by determining why certain classes need corrections and tailoring adjustments for each instance's class probabilities which is powered by fuzzy sets with triangular membership functions, transforming a class probability based on the range it belongs to. By solving a nonlinear integer programming problem with a labeled set of ICL class probabilities to minimize class accuracy bias (COBias) and maximize overall accuracy, each class selects an optimal correction function from 19 triangular membership functions without updating an LLM, and the selected functions correct test instances at inference. Across seven benchmark datasets, FuRud reduces COBias by over half (56%) and improves overall accuracy by 21% relatively, outperforming state-of-the-art debiasing methods.
CLMay 13, 2024
Optimizing Class-Level Probability Reweighting Coefficients for Equitable Prompting AccuracyRuixi Lin, Yang You
Even as we engineer LLMs for alignment and safety, they often uncover biases from pre-training data's statistical regularities (from disproportionate co-occurrences to stereotypical associations mirroring human cognitive biases). This leads to persistent, uneven class accuracy in classification and QA. Such per-class accuracy disparities are not inherently resolved by architectural/training evolutions or data scaling, making post-hoc correction essential for equitable performance. To mitigate LLM class accuracy imbalance, we develop a post-hoc probability reweighting method that directly optimizes for non-differentiable performance-driven and fairness-aligned metrics, through a novel COBias metric that highlights disparities in class accuracies. This post-hoc bias mitigation method is grounded in discrete optimization with nonlinear integer programming (NIP) objectives and an efficient metaheuristic solution framework with theoretical convergence guarantees. Operating model-agnostically, it learns reweighting coefficients from output class probabilities to adjust LLM inference outputs without internal weight updates. Evaluations demonstrate its effectiveness: reducing COBias (61% relative reduction), increasing overall accuracy (18% relative increase), and achieving robust within-task generalization across diverse prompt configurations.
CLNov 2, 2021
System Combination for Grammatical Error Correction Based on Integer ProgrammingRuixi Lin, Hwee Tou Ng
In this paper, we propose a system combination method for grammatical error correction (GEC), based on nonlinear integer programming (IP). Our method optimizes a novel F score objective based on error types, and combines multiple end-to-end GEC systems. The proposed IP approach optimizes the selection of a single best system for each grammatical error type present in the data. Experiments of the IP approach on combining state-of-the-art standalone GEC systems show that the combined system outperforms all standalone systems. It improves F0.5 score by 3.61% when combining the two best participating systems in the BEA 2019 shared task, and achieves F0.5 score of 73.08%. We also perform experiments to compare our IP approach with another state-of-the-art system combination method for GEC, demonstrating IP's competitive combination capability.
CLJun 20, 2018
Multi-Layer Ensembling Techniques for Multilingual Intent ClassificationCharles Costello, Ruixi Lin, Vishwas Mruthyunjaya et al.
In this paper we determine how multi-layer ensembling improves performance on multilingual intent classification. We develop a novel multi-layer ensembling approach that ensembles both different model initializations and different model architectures. We also introduce a new banking domain dataset and compare results against the standard ATIS dataset and the Chinese SMP2017 dataset to determine ensembling performance in multilingual and multi-domain contexts. We run ensemble experiments across all three datasets, and conclude that ensembling provides significant performance increases, and that multi-layer ensembling is a no-risk way to improve performance on intent classification. We also find that a diverse ensemble of simple models can reach perform comparable to much more sophisticated state-of-the-art models. Our best F 1 scores on ATIS, Banking, and SMP are 97.54%, 91.79%, and 93.55% respectively, which compare well with the state-of-the-art on ATIS and best submission to the SMP2017 competition. The total ensembling performance increases we achieve are 0.23%, 1.96%, and 4.04% F 1 respectively.
CLJun 18, 2018
Combining Word Feature Vector Method with the Convolutional Neural Network for Slot Filling in Spoken Language UnderstandingRuixi Lin
Slot filling is an important problem in Spoken Language Understanding (SLU) and Natural Language Processing (NLP), which involves identifying a user's intent and assigning a semantic concept to each word in a sentence. This paper presents a word feature vector method and combines it into the convolutional neural network (CNN). We consider 18 word features and each word feature is constructed by merging similar word labels. By introducing the concept of external library, we propose a feature set approach that is beneficial for building the relationship between a word from the training dataset and the feature. Computational results are reported using the ATIS dataset and comparisons with traditional CNN as well as bi-directional sequential CNN are also presented.
CLMay 23, 2018
Enhancing Chinese Intent Classification by Dynamically Integrating Character Features into Word Embeddings with Ensemble TechniquesRuixi Lin, Charles Costello, Charles Jankowski
Intent classification has been widely researched on English data with deep learning approaches that are based on neural networks and word embeddings. The challenge for Chinese intent classification stems from the fact that, unlike English where most words are made up of 26 phonologic alphabet letters, Chinese is logographic, where a Chinese character is a more basic semantic unit that can be informative and its meaning does not vary too much in contexts. Chinese word embeddings alone can be inadequate for representing words, and pre-trained embeddings can suffer from not aligning well with the task at hand. To account for the inadequacy and leverage Chinese character information, we propose a low-effort and generic way to dynamically integrate character embedding based feature maps with word embedding based inputs, whose resulting word-character embeddings are stacked with a contextual information extraction module to further incorporate context information for predictions. On top of the proposed model, we employ an ensemble method to combine single models and obtain the final result. The approach is data-independent without relying on external sources like pre-trained word embeddings. The proposed model outperforms baseline models and existing methods.
CLSep 7, 2016
Sentiment Classification of Food ReviewsHua Feng, Ruixi Lin
Sentiment analysis of reviews is a popular task in natural language processing. In this work, the goal is to predict the score of food reviews on a scale of 1 to 5 with two recurrent neural networks that are carefully tuned. As for baseline, we train a simple RNN for classification. Then we extend the baseline to GRU. In addition, we present two different methods to deal with highly skewed data, which is a common problem for reviews. Models are evaluated using accuracies.
CLMay 13, 2016
Towards Empathetic Human-Robot InteractionsPascale Fung, Dario Bertero, Yan Wan et al.
Since the late 1990s when speech companies began providing their customer-service software in the market, people have gotten used to speaking to machines. As people interact more often with voice and gesture controlled machines, they expect the machines to recognize different emotions, and understand other high level communication features such as humor, sarcasm and intention. In order to make such communication possible, the machines need an empathy module in them which can extract emotions from human speech and behavior and can decide the correct response of the robot. Although research on empathetic robots is still in the early stage, we described our approach using signal processing techniques, sentiment analysis and machine learning algorithms to make robots that can "understand" human emotion. We propose Zara the Supergirl as a prototype system of empathetic robots. It is a software based virtual android, with an animated cartoon character to present itself on the screen. She will get "smarter" and more empathetic through its deep learning algorithms, and by gathering more data and learning from it. In this paper, we present our work so far in the areas of deep learning of emotion and sentiment recognition, as well as humor recognition. We hope to explore the future direction of android development and how it can help improve people's lives.