Yao Qiang

LG
h-index50
17papers
303citations
Novelty58%
AI Score46

17 Papers

CVAug 28, 2023Code
AutoProSAM: Automated Prompting SAM for 3D Multi-Organ Segmentation

Chengyin Li, Prashant Khanduri, Yao Qiang et al.

Segment Anything Model (SAM) is one of the pioneering prompt-based foundation models for image segmentation and has been rapidly adopted for various medical imaging applications. However, in clinical settings, creating effective prompts is notably challenging and time-consuming, requiring the expertise of domain specialists such as physicians. This requirement significantly diminishes SAM's primary advantage, its interactive capability with end users, in medical applications. Moreover, recent studies have indicated that SAM, originally designed for 2D natural images, performs suboptimally on 3D medical image segmentation tasks. This subpar performance is attributed to the domain gaps between natural and medical images and the disparities in spatial arrangements between 2D and 3D images, particularly in multi-organ segmentation applications. To overcome these challenges, we present a novel technique termed AutoProSAM. This method automates 3D multi-organ CT-based segmentation by leveraging SAM's foundational model capabilities without relying on domain experts for prompts. The approach utilizes parameter-efficient adaptation techniques to adapt SAM for 3D medical imagery and incorporates an effective automatic prompt learning paradigm specific to this domain. By eliminating the need for manual prompts, it enhances SAM's capabilities for 3D medical image segmentation and achieves state-of-the-art (SOTA) performance in CT-based multi-organ segmentation tasks. The code is in this {\href{https://github.com/ChengyinLee/AutoProSAM_2024}{link}}.

IVOct 6, 2022Code
FocalUNETR: A Focal Transformer for Boundary-aware Segmentation of CT Images

Chengyin Li, Yao Qiang, Rafi Ibn Sultan et al.

Computed Tomography (CT) based precise prostate segmentation for treatment planning is challenging due to (1) the unclear boundary of the prostate derived from CT's poor soft tissue contrast and (2) the limitation of convolutional neural network-based models in capturing long-range global context. Here we propose a novel focal transformer-based image segmentation architecture to effectively and efficiently extract local visual features and global context from CT images. Additionally, we design an auxiliary boundary-induced label regression task coupled with the main prostate segmentation task to address the unclear boundary issue in CT images. We demonstrate that this design significantly improves the quality of the CT-based prostate segmentation task over other competing methods, resulting in substantially improved performance, i.e., higher Dice Similarity Coefficient, lower Hausdorff Distance, and Average Symmetric Surface Distance, on both private and public CT image datasets. Our code is available at this \href{https://github.com/ChengyinLee/FocalUNETR.git}{link}.

CVSep 14, 2023Code
Interpretability-Aware Vision Transformer

Yao Qiang, Chengyin Li, Prashant Khanduri et al.

Vision Transformers (ViTs) have become prominent models for solving various vision tasks. However, the interpretability of ViTs has not kept pace with their promising performance. While there has been a surge of interest in developing {\it post hoc} solutions to explain ViTs' outputs, these methods do not generalize to different downstream tasks and various transformer architectures. Furthermore, if ViTs are not properly trained with the given data and do not prioritize the region of interest, the {\it post hoc} methods would be less effective. Instead of developing another {\it post hoc} approach, we introduce a novel training procedure that inherently enhances model interpretability. Our interpretability-aware ViT (IA-ViT) draws inspiration from a fresh insight: both the class patch and image patches consistently generate predicted distributions and attention maps. IA-ViT is composed of a feature extractor, a predictor, and an interpreter, which are trained jointly with an interpretability-aware training objective. Consequently, the interpreter simulates the behavior of the predictor and provides a faithful explanation through its single-head self-attention mechanism. Our comprehensive experimental results demonstrate the effectiveness of IA-ViT in several image classification tasks, with both qualitative and quantitative evaluations of model performance and interpretability. Source code is available from: https://github.com/qiangyao1988/IA-ViT.

CVJan 31, 2023Code
Fairness-aware Vision Transformer via Debiased Self-Attention

Yao Qiang, Chengyin Li, Prashant Khanduri et al.

Vision Transformer (ViT) has recently gained significant attention in solving computer vision (CV) problems due to its capability of extracting informative features and modeling long-range dependencies through the attention mechanism. Whereas recent works have explored the trustworthiness of ViT, including its robustness and explainability, the issue of fairness has not yet been adequately addressed. We establish that the existing fairness-aware algorithms designed for CNNs do not perform well on ViT, which highlights the need to develop our novel framework via Debiased Self-Attention (DSA). DSA is a fairness-through-blindness approach that enforces ViT to eliminate spurious features correlated with the sensitive label for bias mitigation and simultaneously retain real features for target prediction. Notably, DSA leverages adversarial examples to locate and mask the spurious features in the input image patches with an additional attention weights alignment regularizer in the training objective to encourage learning real features for target prediction. Importantly, our DSA framework leads to improved fairness guarantees over prior works on multiple prediction tasks without compromising target prediction performance. Code is available at \href{https://github.com/qiangyao1988/DSA}{https://github.com/qiangyao1988/DSA}.

LGJan 17, 2023Code
Negative Flux Aggregation to Estimate Feature Attributions

Xin Li, Deng Pan, Chengyin Li et al.

There are increasing demands for understanding deep neural networks' (DNNs) behavior spurred by growing security and/or transparency concerns. Due to multi-layer nonlinearity of the deep neural network architectures, explaining DNN predictions still remains as an open problem, preventing us from gaining a deeper understanding of the mechanisms. To enhance the explainability of DNNs, we estimate the input feature's attributions to the prediction task using divergence and flux. Inspired by the divergence theorem in vector analysis, we develop a novel Negative Flux Aggregation (NeFLAG) formulation and an efficient approximation algorithm to estimate attribution map. Unlike the previous techniques, ours doesn't rely on fitting a surrogate model nor need any path integration of gradients. Both qualitative and quantitative experiments demonstrate a superior performance of NeFLAG in generating more faithful attribution maps than the competing methods. Our code is available at \url{https://github.com/xinli0928/NeFLAG}

IVSep 9, 2022
Saliency Guided Adversarial Training for Learning Generalizable Features with Applications to Medical Imaging Classification System

Xin Li, Yao Qiang, Chengyin Li et al.

This work tackles a central machine learning problem of performance degradation on out-of-distribution (OOD) test sets. The problem is particularly salient in medical imaging based diagnosis system that appears to be accurate but fails when tested in new hospitals/datasets. Recent studies indicate the system might learn shortcut and non-relevant features instead of generalizable features, so-called good features. We hypothesize that adversarial training can eliminate shortcut features whereas saliency guided training can filter out non-relevant features; both are nuisance features accounting for the performance degradation on OOD test sets. With that, we formulate a novel model training scheme for the deep neural network to learn good features for classification and/or detection tasks ensuring a consistent generalization performance on OOD test sets. The experimental results qualitatively and quantitatively demonstrate the superior performance of our method using the benchmark CXR image data sets on classification tasks.

LGNov 16, 2023
Hijacking Large Language Models via Adversarial In-Context Learning

Xiangyu Zhou, Yao Qiang, Saleh Zare Zade et al.

In-context learning (ICL) has emerged as a powerful paradigm leveraging LLMs for specific downstream tasks by utilizing labeled examples as demonstrations (demos) in the preconditioned prompts. Despite its promising performance, crafted adversarial attacks pose a notable threat to the robustness of LLMs. Existing attacks are either easy to detect, require a trigger in user input, or lack specificity towards ICL. To address these issues, this work introduces a novel transferable prompt injection attack against ICL, aiming to hijack LLMs to generate the target output or elicit harmful responses. In our threat model, the hacker acts as a model publisher who leverages a gradient-based prompt search method to learn and append imperceptible adversarial suffixes to the in-context demos via prompt injection. We also propose effective defense strategies using a few shots of clean demos, enhancing the robustness of LLMs during ICL. Extensive experimental results across various classification and jailbreak tasks demonstrate the effectiveness of the proposed attack and defense strategies. This work highlights the significant security vulnerabilities of LLMs during ICL and underscores the need for further in-depth studies.

LGNov 23, 2022
Learning Compact Features via In-Training Representation Alignment

Xin Li, Xiangrui Li, Deng Pan et al.

Deep neural networks (DNNs) for supervised learning can be viewed as a pipeline of the feature extractor (i.e., last hidden layer) and a linear classifier (i.e., output layer) that are trained jointly with stochastic gradient descent (SGD) on the loss function (e.g., cross-entropy). In each epoch, the true gradient of the loss function is estimated using a mini-batch sampled from the training set and model parameters are then updated with the mini-batch gradients. Although the latter provides an unbiased estimation of the former, they are subject to substantial variances derived from the size and number of sampled mini-batches, leading to noisy and jumpy updates. To stabilize such undesirable variance in estimating the true gradients, we propose In-Training Representation Alignment (ITRA) that explicitly aligns feature distributions of two different mini-batches with a matching loss in the SGD training process. We also provide a rigorous analysis of the desirable effects of the matching loss on feature representation learning: (1) extracting compact feature representation; (2) reducing over-adaption on mini-batches via an adaptive weighting mechanism; and (3) accommodating to multi-modalities. Finally, we conduct large-scale experiments on both image and text classifications to demonstrate its superior performance to the strong baselines.

CLSep 23, 2024
Generative LLM Powered Conversational AI Application for Personalized Risk Assessment: A Case Study in COVID-19

Mohammad Amin Roshani, Xiangyu Zhou, Yao Qiang et al.

Large language models (LLMs) have shown remarkable capabilities in various natural language tasks and are increasingly being applied in healthcare domains. This work demonstrates a new LLM-powered disease risk assessment approach via streaming human-AI conversation, eliminating the need for programming required by traditional machine learning approaches. In a COVID-19 severity risk assessment case study, we fine-tune pre-trained generative LLMs (e.g., Llama2-7b and Flan-t5-xl) using a few shots of natural language examples, comparing their performance with traditional classifiers (i.e., Logistic Regression, XGBoost, Random Forest) that are trained de novo using tabular data across various experimental settings. We develop a mobile application that uses these fine-tuned LLMs as its generative AI (GenAI) core to facilitate real-time interaction between clinicians and patients, providing no-code risk assessment through conversational interfaces. This integration not only allows for the use of streaming Questions and Answers (QA) as inputs but also offers personalized feature importance analysis derived from the LLM's attention layers, enhancing the interpretability of risk assessments. By achieving high Area Under the Curve (AUC) scores with a limited number of fine-tuning samples, our results demonstrate the potential of generative LLMs to outperform discriminative classification methods in low-data regimes, highlighting their real-world adaptability and effectiveness. This work aims to fill the existing gap in leveraging generative LLMs for interactive no-code risk assessment and to encourage further research in this emerging field.

LGNov 21, 2023
FedDRO: Federated Compositional Optimization for Distributionally Robust Learning

Prashant Khanduri, Chengyin Li, Rafi Ibn Sultan et al.

Recently, compositional optimization (CO) has gained popularity because of its applications in distributionally robust optimization (DRO) and many other machine learning problems. Large-scale and distributed availability of data demands the development of efficient federated learning (FL) algorithms for solving CO problems. Developing FL algorithms for CO is particularly challenging because of the compositional nature of the objective. Moreover, current state-of-the-art methods to solve such problems rely on large batch gradients (depending on the solution accuracy) not feasible for most practical settings. To address these challenges, in this work, we propose efficient FedAvg-type algorithms for solving non-convex CO in the FL setting. We first establish that vanilla FedAvg is not suitable to solve distributed CO problems because of the data heterogeneity in the compositional objective at each client which leads to the amplification of bias in the local compositional gradient estimates. To this end, we propose a novel FL framework FedDRO that utilizes the DRO problem structure to design a communication strategy that allows FedAvg to control the bias in the estimation of the compositional gradient. A key novelty of our work is to develop solution accuracy-independent algorithms that do not require large batch gradients (and function evaluations) for solving federated CO problems. We establish $\mathcal{O}(ε^{-2})$ sample and $\mathcal{O}(ε^{-3/2})$ communication complexity in the FL setting while achieving linear speedup with the number of clients. We corroborate our theoretical findings with empirical studies on large-scale DRO problems.

CLMar 3
RAG-X: Systematic Diagnosis of Retrieval-Augmented Generation for Medical Question Answering

Aswini Sivakumar, Vijayan Sugumaran, Yao Qiang

Automated question-answering (QA) systems increasingly rely on retrieval-augmented generation (RAG) to ground large language models (LLMs) in authoritative medical knowledge, ensuring clinical accuracy and patient safety in Artificial Intelligence (AI) applications for healthcare. Despite progress in RAG evaluation, current benchmarks focus only on simple multiple-choice QA tasks and employ metrics that poorly capture the semantic precision required for complex QA tasks. These approaches fail to diagnose whether an error stems from faulty retrieval or flawed generation, limiting developers from performing targeted improvement. To address this gap, we propose RAG-X, a diagnostic framework that evaluates the retriever and generator independently across a triad of QA tasks: information extraction, short-answer generation, and multiple-choice question (MCQ) answering. RAG-X introduces Context Utilization Efficiency (CUE) metrics to disaggregate system success into interpretable quadrants, isolating verified grounding from deceptive accuracy. Our experiments reveal an ``Accuracy Fallacy", where a 14\% gap separates perceived system success from evidence-based grounding. By surfacing hidden failure modes, RAG-X offers the diagnostic transparency needed for safe and verifiable clinical RAG systems.

LGMay 6, 2025Code
Automatic Calibration for Membership Inference Attack on Large Language Models

Saleh Zare Zade, Yao Qiang, Xiangyu Zhou et al.

Membership Inference Attacks (MIAs) have recently been employed to determine whether a specific text was part of the pre-training data of Large Language Models (LLMs). However, existing methods often misinfer non-members as members, leading to a high false positive rate, or depend on additional reference models for probability calibration, which limits their practicality. To overcome these challenges, we introduce a novel framework called Automatic Calibration Membership Inference Attack (ACMIA), which utilizes a tunable temperature to calibrate output probabilities effectively. This approach is inspired by our theoretical insights into maximum likelihood estimation during the pre-training of LLMs. We introduce ACMIA in three configurations designed to accommodate different levels of model access and increase the probability gap between members and non-members, improving the reliability and robustness of membership inference. Extensive experiments on various open-source LLMs demonstrate that our proposed attack is highly effective, robust, and generalizable, surpassing state-of-the-art baselines across three widely used benchmarks. Our code is available at: \href{https://github.com/Salehzz/ACMIA}{\textcolor{blue}{Github}}.

CLFeb 24, 2024
Prompt Perturbation Consistency Learning for Robust Language Models

Yao Qiang, Subhrangshu Nandi, Ninareh Mehrabi et al.

Large language models (LLMs) have demonstrated impressive performance on a number of natural language processing tasks, such as question answering and text summarization. However, their performance on sequence labeling tasks such as intent classification and slot filling (IC-SF), which is a central component in personal assistant systems, lags significantly behind discriminative models. Furthermore, there is a lack of substantive research on the robustness of LLMs to various perturbations in the input prompts. The contributions of this paper are three-fold. First, we show that fine-tuning sufficiently large LLMs can produce IC-SF performance comparable to discriminative models. Next, we systematically analyze the performance deterioration of those fine-tuned models due to three distinct yet relevant types of input perturbations - oronyms, synonyms, and paraphrasing. Finally, we propose an efficient mitigation approach, Prompt Perturbation Consistency Learning (PPCL), which works by regularizing the divergence between losses from clean and perturbed samples. Our experiments demonstrate that PPCL can recover on average 59% and 69% of the performance drop for IC and SF tasks, respectively. Furthermore, PPCL beats the data augmentation approach while using ten times fewer augmented data samples.

LGFeb 21, 2024
Learning to Poison Large Language Models for Downstream Manipulation

Xiangyu Zhou, Yao Qiang, Saleh Zare Zade et al.

The advent of Large Language Models (LLMs) has marked significant achievements in language processing and reasoning capabilities. Despite their advancements, LLMs face vulnerabilities to data poisoning attacks, where the adversary inserts backdoor triggers into training data to manipulate outputs. This work further identifies additional security risks in LLMs by designing a new data poisoning attack tailored to exploit the supervised fine-tuning (SFT) process. We propose a novel gradient-guided backdoor trigger learning (GBTL) algorithm to identify adversarial triggers efficiently, ensuring an evasion of detection by conventional defenses while maintaining content integrity. Through experimental validation across various language model tasks, including sentiment analysis, domain generation, and question answering, our poisoning strategy demonstrates a high success rate in compromising various LLMs' outputs. We further propose two defense strategies against data poisoning attacks, including in-context learning (ICL) and continuous learning (CL), which effectively rectify the behavior of LLMs and significantly reduce the decline in performance. Our work highlights the significant security risks present during SFT of LLMs and the necessity of safeguarding LLMs against data poisoning attacks.

LGJun 3, 2025
Not All Tokens Are Meant to Be Forgotten

Xiangyu Zhou, Yao Qiang, Saleh Zare Zade et al.

Large Language Models (LLMs), pre-trained on massive text corpora, exhibit remarkable human-level language understanding, reasoning, and decision-making abilities. However, they tend to memorize unwanted information, such as private or copyrighted content, raising significant privacy and legal concerns. Unlearning has emerged as a promising solution, but existing methods face a significant challenge of over-forgetting. This issue arises because they indiscriminately suppress the generation of all the tokens in forget samples, leading to a substantial loss of model utility. To overcome this challenge, we introduce the Targeted Information Forgetting (TIF) framework, which consists of (1) a flexible targeted information identifier designed to differentiate between unwanted words (UW) and general words (GW) in the forget samples, and (2) a novel Targeted Preference Optimization approach that leverages Logit Preference Loss to unlearn unwanted information associated with UW and Preservation Loss to retain general information in GW, effectively improving the unlearning process while mitigating utility degradation. Extensive experiments on the TOFU and MUSE benchmarks demonstrate that the proposed TIF framework enhances unlearning effectiveness while preserving model utility and achieving state-of-the-art results.

LGJan 10, 2021
Adversarially Robust and Explainable Model Compression with On-Device Personalization for Text Classification

Yao Qiang, Supriya Tumkur Suresh Kumar, Marco Brocanelli et al.

On-device Deep Neural Networks (DNNs) have recently gained more attention due to the increasing computing power of the mobile devices and the number of applications in Computer Vision (CV), Natural Language Processing (NLP), and Internet of Things (IoTs). Unfortunately, the existing efficient convolutional neural network (CNN) architectures designed for CV tasks are not directly applicable to NLP tasks and the tiny Recurrent Neural Network (RNN) architectures have been designed primarily for IoT applications. In NLP applications, although model compression has seen initial success in on-device text classification, there are at least three major challenges yet to be addressed: adversarial robustness, explainability, and personalization. Here we attempt to tackle these challenges by designing a new training scheme for model compression and adversarial robustness, including the optimization of an explainable feature mapping objective, a knowledge distillation objective, and an adversarially robustness objective. The resulting compressed model is personalized using on-device private training data via fine-tuning. We perform extensive experiments to compare our approach with both compact RNN (e.g., FastGRNN) and compressed RNN (e.g., PRADO) architectures in both natural and adversarial NLP test settings.

CLMar 22, 2020
Toward Tag-free Aspect Based Sentiment Analysis: A Multiple Attention Network Approach

Yao Qiang, Xin Li, Dongxiao Zhu

Existing aspect based sentiment analysis (ABSA) approaches leverage various neural network models to extract the aspect sentiments via learning aspect-specific feature representations. However, these approaches heavily rely on manual tagging of user reviews according to the predefined aspects as the input, a laborious and time-consuming process. Moreover, the underlying methods do not explain how and why the opposing aspect level polarities in a user review lead to the overall polarity. In this paper, we tackle these two problems by designing and implementing a new Multiple-Attention Network (MAN) approach for more powerful ABSA without the need for aspect tags using two new tag-free data sets crawled directly from TripAdvisor ({https://www.tripadvisor.com}). With the Self- and Position-Aware attention mechanism, MAN is capable of extracting both aspect level and overall sentiments from the text reviews using the aspect level and overall customer ratings, and it can also detect the vital aspect(s) leading to the overall sentiment polarity among different aspects via a new aspect ranking scheme. We carry out extensive experiments to demonstrate the strong performance of MAN compared to other state-of-the-art ABSA approaches and the explainability of our approach by visualizing and interpreting attention weights in case studies.