CVOct 25, 2023Code
MACP: Efficient Model Adaptation for Cooperative PerceptionYunsheng Ma, Juanwu Lu, Can Cui et al.
Vehicle-to-vehicle (V2V) communications have greatly enhanced the perception capabilities of connected and automated vehicles (CAVs) by enabling information sharing to "see through the occlusions", resulting in significant performance improvements. However, developing and training complex multi-agent perception models from scratch can be expensive and unnecessary when existing single-agent models show remarkable generalization capabilities. In this paper, we propose a new framework termed MACP, which equips a single-agent pre-trained model with cooperation capabilities. We approach this objective by identifying the key challenges of shifting from single-agent to cooperative settings, adapting the model by freezing most of its parameters and adding a few lightweight modules. We demonstrate in our experiments that the proposed framework can effectively utilize cooperative observations and outperform other state-of-the-art approaches in both simulated and real-world cooperative perception benchmarks while requiring substantially fewer tunable parameters with reduced communication costs. Our source code is available at https://github.com/PurdueDigitalTwin/MACP.
CVOct 30, 2022Code
ViTASD: Robust Vision Transformer Baselines for Autism Spectrum Disorder Facial DiagnosisXu Cao, Wenqian Ye, Elena Sizikova et al.
Autism spectrum disorder (ASD) is a lifelong neurodevelopmental disorder with very high prevalence around the world. Research progress in the field of ASD facial analysis in pediatric patients has been hindered due to a lack of well-established baselines. In this paper, we propose the use of the Vision Transformer (ViT) for the computational analysis of pediatric ASD. The presented model, known as ViTASD, distills knowledge from large facial expression datasets and offers model structure transferability. Specifically, ViTASD employs a vanilla ViT to extract features from patients' face images and adopts a lightweight decoder with a Gaussian Process layer to enhance the robustness for ASD analysis. Extensive experiments conducted on standard ASD facial analysis benchmarks show that our method outperforms all of the representative approaches in ASD facial analysis, while the ViTASD-L achieves a new state-of-the-art. Our code and pretrained models are available at https://github.com/IrohXu/ViTASD.
AINov 21, 2023
A Survey on Multimodal Large Language Models for Autonomous DrivingCan Cui, Yunsheng Ma, Xu Cao et al.
With the emergence of Large Language Models (LLMs) and Vision Foundation Models (VFMs), multimodal AI systems benefiting from large models have the potential to equally perceive the real world, make decisions, and control tools as humans. In recent months, LLMs have shown widespread attention in autonomous driving and map systems. Despite its immense potential, there is still a lack of a comprehensive understanding of key challenges, opportunities, and future endeavors to apply in LLM driving systems. In this paper, we present a systematic investigation in this field. We first introduce the background of Multimodal Large Language Models (MLLMs), the multimodal models development using LLMs, and the history of autonomous driving. Then, we overview existing MLLM tools for driving, transportation, and map systems together with existing datasets and benchmarks. Moreover, we summarized the works in The 1st WACV Workshop on Large Language and Vision Models for Autonomous Driving (LLVM-AD), which is the first workshop of its kind regarding LLMs in autonomous driving. To further promote the development of this field, we also discuss several important problems regarding using MLLMs in autonomous driving systems that need to be solved by both academia and industry.
CVSep 4, 2024Code
Benchmarking Spurious Bias in Few-Shot Image ClassifiersGuangtao Zheng, Wenqian Ye, Aidong Zhang
Few-shot image classifiers are designed to recognize and classify new data with minimal supervision and limited data but often show reliance on spurious correlations between classes and spurious attributes, known as spurious bias. Spurious correlations commonly hold in certain samples and few-shot classifiers can suffer from spurious bias induced from them. There is an absence of an automatic benchmarking system to assess the robustness of few-shot classifiers against spurious bias. In this paper, we propose a systematic and rigorous benchmark framework, termed FewSTAB, to fairly demonstrate and quantify varied degrees of robustness of few-shot classifiers to spurious bias. FewSTAB creates few-shot evaluation tasks with biased attributes so that using them for predictions can demonstrate poor performance. To construct these tasks, we propose attribute-based sample selection strategies based on a pre-trained vision-language model, eliminating the need for manual dataset curation. This allows FewSTAB to automatically benchmark spurious bias using any existing test data. FewSTAB offers evaluation results in a new dimension along with a new design guideline for building robust classifiers. Moreover, it can benchmark spurious bias in varied degrees and enable designs for varied degrees of robustness. Its effectiveness is demonstrated through experiments on ten few-shot learning methods across three datasets. We hope our framework can inspire new designs of robust few-shot classifiers. Our code is available at https://github.com/gtzheng/FewSTAB.
HCSep 19, 2023
Drive as You Speak: Enabling Human-Like Interaction with Large Language Models in Autonomous VehiclesCan Cui, Yunsheng Ma, Xu Cao et al.
The future of autonomous vehicles lies in the convergence of human-centric design and advanced AI capabilities. Autonomous vehicles of the future will not only transport passengers but also interact and adapt to their desires, making the journey comfortable, efficient, and pleasant. In this paper, we present a novel framework that leverages Large Language Models (LLMs) to enhance autonomous vehicles' decision-making processes. By integrating LLMs' natural language capabilities and contextual understanding, specialized tools usage, synergizing reasoning, and acting with various modules on autonomous vehicles, this framework aims to seamlessly integrate the advanced language and reasoning capabilities of LLMs into autonomous vehicles. The proposed framework holds the potential to revolutionize the way autonomous vehicles operate, offering personalized assistance, continuous learning, and transparent decision-making, ultimately contributing to safer and more efficient autonomous driving technologies.
HCOct 12, 2023
Receive, Reason, and React: Drive as You Say with Large Language Models in Autonomous VehiclesCan Cui, Yunsheng Ma, Xu Cao et al.
The fusion of human-centric design and artificial intelligence (AI) capabilities has opened up new possibilities for next-generation autonomous vehicles that go beyond transportation. These vehicles can dynamically interact with passengers and adapt to their preferences. This paper proposes a novel framework that leverages Large Language Models (LLMs) to enhance the decision-making process in autonomous vehicles. By utilizing LLMs' linguistic and contextual understanding abilities with specialized tools, we aim to integrate the language and reasoning capabilities of LLMs into autonomous vehicles. Our research includes experiments in HighwayEnv, a collection of environments for autonomous driving and tactical decision-making tasks, to explore LLMs' interpretation, interaction, and reasoning in various scenarios. We also examine real-time personalization, demonstrating how LLMs can influence driving behaviors based on verbal commands. Our empirical results highlight the substantial advantages of utilizing chain-of-thought prompting, leading to improved driving decisions, and showing the potential for LLMs to enhance personalized driving experiences through ongoing verbal feedback. The proposed framework aims to transform autonomous vehicle operations, offering personalized support, transparent decision-making, and continuous learning to enhance safety and effectiveness. We achieve user-centric, transparent, and adaptive autonomous driving ecosystems supported by the integration of LLMs into autonomous vehicles.
LGJun 12, 2023
Mitigating Transformer Overconfidence via Lipschitz RegularizationWenqian Ye, Yunsheng Ma, Xu Cao et al.
Though Transformers have achieved promising results in many computer vision tasks, they tend to be over-confident in predictions, as the standard Dot Product Self-Attention (DPSA) can barely preserve distance for the unbounded input domain. In this work, we fill this gap by proposing a novel Lipschitz Regularized Transformer (LRFormer). Specifically, we present a new similarity function with the distance within Banach Space to ensure the Lipschitzness and also regularize the term by a contractive Lipschitz Bound. The proposed method is analyzed with a theoretical guarantee, providing a rigorous basis for its effectiveness and reliability. Extensive experiments conducted on standard vision benchmarks demonstrate that our method outperforms the state-of-the-art single forward pass approaches in prediction, calibration, and uncertainty estimation.
CLDec 7, 2023Code
LaMPilot: An Open Benchmark Dataset for Autonomous Driving with Language Model ProgramsYunsheng Ma, Can Cui, Xu Cao et al.
Autonomous driving (AD) has made significant strides in recent years. However, existing frameworks struggle to interpret and execute spontaneous user instructions, such as "overtake the car ahead." Large Language Models (LLMs) have demonstrated impressive reasoning capabilities showing potential to bridge this gap. In this paper, we present LaMPilot, a novel framework that integrates LLMs into AD systems, enabling them to follow user instructions by generating code that leverages established functional primitives. We also introduce LaMPilot-Bench, the first benchmark dataset specifically designed to quantitatively evaluate the efficacy of language model programs in AD. Adopting the LaMPilot framework, we conduct extensive experiments to assess the performance of off-the-shelf LLMs on LaMPilot-Bench. Our results demonstrate the potential of LLMs in handling diverse driving scenarios and following user instructions in driving. To facilitate further research in this area, we release our code and data at https://github.com/PurdueDigitalTwin/LaMPilot.
AIMay 22
A Sober Look at Agentic Misalignment in Automated WorkflowsWenqian Ye, Bo Yuan, Zhichao Xu et al.
We study a class of emergent misalignment in multi-agent systems (MAS), with a focus on automated workflows, which we refer to agentic misalignment. Although these systems can solve complex tasks, they often fail because agents act according to implicit proxy utilities that do not align with the intended human goals. We formally define these behaviors and analyze them within a Bayesian framework, showing that generic utilities naturally lead to posterior collapse of agents in automated workflows. To address this issue, we propose Agentic Evidence Attribution (AEA), a novel alignment paradigm that improves agent posteriors using context-specific evidence. AEA reasons over agent actions and provides structured evidence to correct misaligned behavior during collaboration. To better understand the role of evidence, we study two instantiations of AEA: self-reflection (internal evidence from the model) and weak-to-strong generalization (external evidence on the agentic trajectory). We show that a small evidence model effectively aligns the MAS by providing orthogonal failure attribution. Our results clarify the sources of agentic misalignment in automated workflows and show that evidence-based alignment can effectively improve agent collaboration and leads to reliable multi-agent systems built on automated workflows.
LGMay 23, 2024Code
EHRMamba: Towards Generalizable and Scalable Foundation Models for Electronic Health RecordsAdibvafa Fallahpour, Mahshid Alinoori, Wenqian Ye et al.
Transformers have significantly advanced the modeling of Electronic Health Records (EHR), yet their deployment in real-world healthcare is limited by several key challenges. Firstly, the quadratic computational cost and insufficient context length of these models hinder hospitals' ability in processing the extensive medical histories typical in EHR data. Additionally, existing models employ separate finetuning for each clinical task, complicating maintenance in healthcare environments. Moreover, these models focus exclusively on either clinical prediction or EHR forecasting, lacking proficiency in both tasks. To overcome these limitations, we introduce EHRMamba, a robust foundation model built on the Mamba architecture. EHRMamba can process sequences up to 300% longer than previous models due to its linear computational cost. We also introduce a novel approach to Multitask Prompted Finetuning (MPF) for EHR data, which enables EHRMamba to simultaneously learn multiple clinical tasks in a single finetuning phase, significantly enhancing deployment and cross-task generalization. Furthermore, our model leverages the HL7 FHIR data standard to simplify integration into existing hospital systems. Alongside EHRMamba, we open-source Odyssey, a toolkit designed to support the development and deployment of EHR foundation models, with an emphasis on data standardization and interpretability. Our evaluations on the MIMIC-IV dataset demonstrate that EHRMamba advances state-of-the-art performance across 6 major clinical tasks and excels in EHR forecasting, marking a significant leap forward in the field.
IVSep 21, 2023
PIE: Simulating Disease Progression via Progressive Image EditingKaizhao Liang, Xu Cao, Kuei-Da Liao et al.
Disease progression simulation is a crucial area of research that has significant implications for clinical diagnosis, prognosis, and treatment. One major challenge in this field is the lack of continuous medical imaging monitoring of individual patients over time. To address this issue, we develop a novel framework termed Progressive Image Editing (PIE) that enables controlled manipulation of disease-related image features, facilitating precise and realistic disease progression simulation. Specifically, we leverage recent advancements in text-to-image generative models to simulate disease progression accurately and personalize it for each patient. We theoretically analyze the iterative refining process in our framework as a gradient descent with an exponentially decayed learning rate. To validate our framework, we conduct experiments in three medical imaging domains. Our results demonstrate the superiority of PIE over existing methods such as Stable Diffusion Walk and Style-Based Manifold Extrapolation based on CLIP score (Realism) and Disease Classification Confidence (Alignment). Our user study collected feedback from 35 veteran physicians to assess the generated progressions. Remarkably, 76.2% of the feedback agrees with the fidelity of the generated progressions. To our best knowledge, PIE is the first of its kind to generate disease progression images meeting real-world standards. It is a promising tool for medical research and clinical practice, potentially allowing healthcare providers to model disease trajectories over time, predict future treatment responses, and improve patient outcomes.
IVNov 16, 2024Code
MpoxVLM: A Vision-Language Model for Diagnosing Skin Lesions from Mpox Virus InfectionXu Cao, Wenqian Ye, Kenny Moise et al.
In the aftermath of the COVID-19 pandemic and amid accelerating climate change, emerging infectious diseases, particularly those arising from zoonotic spillover, remain a global threat. Mpox (caused by the monkeypox virus) is a notable example of a zoonotic infection that often goes undiagnosed, especially as its rash progresses through stages, complicating detection across diverse populations with different presentations. In August 2024, the WHO Director-General declared the mpox outbreak a public health emergency of international concern for a second time. Despite the deployment of deep learning techniques for detecting diseases from skin lesion images, a robust and publicly accessible foundation model for mpox diagnosis is still lacking due to the unavailability of open-source mpox skin lesion images, multimodal clinical data, and specialized training pipelines. To address this gap, we propose MpoxVLM, a vision-language model (VLM) designed to detect mpox by analyzing both skin lesion images and patient clinical information. MpoxVLM integrates the CLIP visual encoder, an enhanced Vision Transformer (ViT) classifier for skin lesions, and LLaMA-2-7B models, pre-trained and fine-tuned on visual instruction-following question-answer pairs from our newly released mpox skin lesion dataset. Our work achieves 90.38% accuracy for mpox detection, offering a promising pathway to improve early diagnostic accuracy in combating mpox.
CLOct 12, 2025Code
RECON: Reasoning with Condensation for Efficient Retrieval-Augmented GenerationZhichao Xu, Minheng Wang, Yawei Wang et al.
Retrieval-augmented generation (RAG) systems trained using reinforcement learning (RL) with reasoning are hampered by inefficient context management, where long, noisy retrieved documents increase costs and degrade performance. We introduce RECON (REasoning with CONdensation), a framework that integrates an explicit summarization module to compress evidence within the reasoning loop. Our summarizer is trained via a two-stage process: relevance pretraining on QA datasets, followed by multi-aspect distillation from proprietary LLMs to ensure factuality and clarity. Integrated into the Search-R1 pipeline, RECON reduces total context length by 35\%, leading to improved training speed and inference latency, while simultaneously improving RAG performance on downstream QA benchmarks. Notably, it boosts the average EM score of the 3B model by 14.5\% and the 7B model by 3.0\%, showing particular strength in multi-hop QA. RECON demonstrates that learned context compression is essential for building practical, scalable, and performant RAG systems. Our code implementation is made available at https://github.com/allfornancy/RECON.
CVJun 24, 2024Code
MM-SpuBench: Towards Better Understanding of Spurious Biases in Multimodal LLMsWenqian Ye, Bohan Liu, Guangtao Zheng et al.
Spurious bias, a tendency to exploit spurious correlations between superficial input attributes and prediction targets, has revealed a severe robustness pitfall in classical machine learning problems. Multimodal Large Language Models (MLLMs), which leverage pretrained vision and language models, have recently demonstrated strong capability in joint vision-language understanding. However, both the presence and severity of spurious biases in MLLMs remain poorly understood. In this work, we address this gap by analyzing the spurious biases in the multimodal setting and uncovering the specific inference-time data patterns that can manifest this problem. To support this analysis, we introduce MM-SpuBench, a comprehensive, human-verified benchmark dataset consisting of image-class pairs annotated with core and spurious attributes, grounded in our taxonomy of nine distinct types of spurious correlations. The benchmark is constructed using human-interpretable attribute information to capture a wide range of spurious patterns reflective of real-world knowledge. Leveraging this benchmark, we conduct a comprehensive evaluation of the state-of-the-art open-source and proprietary MLLMs with both standard accuracy and the proposed Conditional Generation Likelihood Advantage (CGLA). Our findings highlight the persistence of reliance on spurious correlations and the difficulty of mitigation on our benchmark. We hope this work can inspire new technical strides to mitigate these biases. Our benchmark is publicly available at https://huggingface.co/datasets/mmbench/MM-SpuBench.
CVJun 14, 2024Code
What is the Visual Cognition Gap between Humans and Multimodal LLMs?Xu Cao, Yifan Shen, Bolin Lai et al.
Recently, Multimodal Large Language Models (MLLMs) and Vision Language Models (VLMs) have shown great promise in language-guided perceptual tasks such as recognition, segmentation, and object detection. However, their effectiveness in addressing visual cognition problems that require high-level multi-image reasoning and visual working memory is not well-established. One such challenge is matrix reasoning - the cognitive ability to discern relationships among patterns in a set of images and extrapolate to predict subsequent patterns. This skill is crucial during the early neurodevelopmental stages of children. Inspired by the matrix reasoning tasks in Raven's Progressive Matrices (RPM) and Wechsler Intelligence Scale for Children (WISC), we propose a new dataset MaRs-VQA to evaluate the visual cognition capability of MLLMs and compare their performance with existing human visual cognition studies. Based on the training data of MaRs-VQA, we also finetune a baseline model Qwen2-VCog with multi-stage cognition reasoning annotations. Our comparative experiments with different baselines reveal a gap between MLLMs and human intelligence, highlighting the visual cognitive limitations of current MLLMs. We believe that the public release of MaRs-VQA and the Qwen2-VCog baseline model will drive progress toward the next generation of MLLMs with human-like visual cognition abilities. MaRs-VQA is available at huggingface.co/datasets/IrohXu/VCog-Bench. The training code of Qwen2-VCog is available at github.com/IrohXu/Cognition-MLLM.
LGFeb 20, 2024
The Clever Hans Mirage: A Comprehensive Survey on Spurious Correlations in Machine LearningWenqian Ye, Luyang Jiang, Eric Xie et al.
Back in the early 20th century, a horse named Hans appeared to perform arithmetic and other intellectual tasks during exhibitions in Germany, while it actually relied solely on involuntary cues in the body language from the human trainer. Modern machine learning models are no different. These models are known to be sensitive to spurious correlations between non-essential features of the inputs (e.g., background, texture, and secondary objects) and the corresponding labels. Such features and their correlations with the labels are known as "spurious" because they tend to change with shifts in real-world data distributions, which can negatively impact the model's generalization and robustness. In this paper, we provide a comprehensive survey of this emerging issue, along with a fine-grained taxonomy of existing state-of-the-art methods for addressing spurious correlations in machine learning models. Additionally, we summarize existing datasets, benchmarks, and metrics to facilitate future research. The paper concludes with a discussion of the broader impacts, the recent advancements, and future challenges in the era of generative AI, aiming to provide valuable insights for researchers in the related domains of the machine learning community.
AINov 17, 2024
On-Board Vision-Language Models for Personalized Autonomous Vehicle Motion Control: System Design and Real-World ValidationCan Cui, Zichong Yang, Yupeng Zhou et al.
Personalized driving refers to an autonomous vehicle's ability to adapt its driving behavior or control strategies to match individual users' preferences and driving styles while maintaining safety and comfort standards. However, existing works either fail to capture every individual preference precisely or become computationally inefficient as the user base expands. Vision-Language Models (VLMs) offer promising solutions to this front through their natural language understanding and scene reasoning capabilities. In this work, we propose a lightweight yet effective on-board VLM framework that provides low-latency personalized driving performance while maintaining strong reasoning capabilities. Our solution incorporates a Retrieval-Augmented Generation (RAG)-based memory module that enables continuous learning of individual driving preferences through human feedback. Through comprehensive real-world vehicle deployment and experiments, our system has demonstrated the ability to provide safe, comfortable, and personalized driving experiences across various scenarios and significantly reduce takeover rates by up to 76.9%. To the best of our knowledge, this work represents the first end-to-end VLM-based motion control system in real-world autonomous vehicles.
CVNov 18, 2024
Medical Video Generation for Disease Progression SimulationXu Cao, Kaizhao Liang, Kuei-Da Liao et al.
Modeling disease progression is crucial for improving the quality and efficacy of clinical diagnosis and prognosis, but it is often hindered by a lack of longitudinal medical image monitoring for individual patients. To address this challenge, we propose the first Medical Video Generation (MVG) framework that enables controlled manipulation of disease-related image and video features, allowing precise, realistic, and personalized simulations of disease progression. Our approach begins by leveraging large language models (LLMs) to recaption prompt for disease trajectory. Next, a controllable multi-round diffusion model simulates the disease progression state for each patient, creating realistic intermediate disease state sequence. Finally, a diffusion-based video transition generation model interpolates disease progression between these states. We validate our framework across three medical imaging domains: chest X-ray, fundus photography, and skin image. Our results demonstrate that MVG significantly outperforms baseline models in generating coherent and clinically plausible disease trajectories. Two user studies by veteran physicians, provide further validation and insights into the clinical utility of the generated sequences. MVG has the potential to assist healthcare providers in modeling disease trajectories, interpolating missing medical image data, and enhancing medical education through realistic, dynamic visualizations of disease progression.
LGMay 20, 2025
ShortcutProbe: Probing Prediction Shortcuts for Learning Robust ModelsGuangtao Zheng, Wenqian Ye, Aidong Zhang
Deep learning models often achieve high performance by inadvertently learning spurious correlations between targets and non-essential features. For example, an image classifier may identify an object via its background that spuriously correlates with it. This prediction behavior, known as spurious bias, severely degrades model performance on data that lacks the learned spurious correlations. Existing methods on spurious bias mitigation typically require a variety of data groups with spurious correlation annotations called group labels. However, group labels require costly human annotations and often fail to capture subtle spurious biases such as relying on specific pixels for predictions. In this paper, we propose a novel post hoc spurious bias mitigation framework without requiring group labels. Our framework, termed ShortcutProbe, identifies prediction shortcuts that reflect potential non-robustness in predictions in a given model's latent space. The model is then retrained to be invariant to the identified prediction shortcuts for improved robustness. We theoretically analyze the effectiveness of the framework and empirically demonstrate that it is an efficient and practical tool for improving a model's robustness to spurious bias on diverse datasets.
LGJun 12, 2025
Improving Group Robustness on Spurious Correlation via Evidential AlignmentWenqian Ye, Guangtao Zheng, Aidong Zhang
Deep neural networks often learn and rely on spurious correlations, i.e., superficial associations between non-causal features and the targets. For instance, an image classifier may identify camels based on the desert backgrounds. While it can yield high overall accuracy during training, it degrades generalization on more diverse scenarios where such correlations do not hold. This problem poses significant challenges for out-of-distribution robustness and trustworthiness. Existing methods typically mitigate this issue by using external group annotations or auxiliary deterministic models to learn unbiased representations. However, such information is costly to obtain, and deterministic models may fail to capture the full spectrum of biases learned by the models. To address these limitations, we propose Evidential Alignment, a novel framework that leverages uncertainty quantification to understand the behavior of the biased models without requiring group annotations. By quantifying the evidence of model prediction with second-order risk minimization and calibrating the biased models with the proposed evidential calibration technique, Evidential Alignment identifies and suppresses spurious correlations while preserving core features. We theoretically justify the effectiveness of our method as capable of learning the patterns of biased models and debiasing the model without requiring any spurious correlation annotations. Empirical results demonstrate that our method significantly improves group robustness across diverse architectures and data modalities, providing a scalable and principled solution to spurious correlations.
LGMay 29, 2025
NeuronTune: Towards Self-Guided Spurious Bias MitigationGuangtao Zheng, Wenqian Ye, Aidong Zhang
Deep neural networks often develop spurious bias, reliance on correlations between non-essential features and classes for predictions. For example, a model may identify objects based on frequently co-occurring backgrounds rather than intrinsic features, resulting in degraded performance on data lacking these correlations. Existing mitigation approaches typically depend on external annotations of spurious correlations, which may be difficult to obtain and are not relevant to the spurious bias in a model. In this paper, we take a step towards self-guided mitigation of spurious bias by proposing NeuronTune, a post hoc method that directly intervenes in a model's internal decision process. Our method probes in a model's latent embedding space to identify and regulate neurons that lead to spurious prediction behaviors. We theoretically justify our approach and show that it brings the model closer to an unbiased one. Unlike previous methods, NeuronTune operates without requiring spurious correlation annotations, making it a practical and effective tool for improving model robustness. Experiments across different architectures and data modalities demonstrate that our method significantly mitigates spurious bias in a self-guided way.
LGAug 10, 2025
Towards Unveiling Predictive Uncertainty Vulnerabilities in the Context of the Right to Be ForgottenWei Qian, Chenxu Zhao, Yangyi Li et al.
Currently, various uncertainty quantification methods have been proposed to provide certainty and probability estimates for deep learning models' label predictions. Meanwhile, with the growing demand for the right to be forgotten, machine unlearning has been extensively studied as a means to remove the impact of requested sensitive data from a pre-trained model without retraining the model from scratch. However, the vulnerabilities of such generated predictive uncertainties with regard to dedicated malicious unlearning attacks remain unexplored. To bridge this gap, for the first time, we propose a new class of malicious unlearning attacks against predictive uncertainties, where the adversary aims to cause the desired manipulations of specific predictive uncertainty results. We also design novel optimization frameworks for our attacks and conduct extensive experiments, including black-box scenarios. Notably, our extensive experiments show that our attacks are more effective in manipulating predictive uncertainties than traditional attacks that focus on label misclassifications, and existing defenses against conventional attacks are ineffective against our attacks.
LGFeb 10, 2025
Recent Advances, Applications and Open Challenges in Machine Learning for Health: Reflections from Research Roundtables at ML4H 2024 SymposiumAmin Adibi, Xu Cao, Zongliang Ji et al.
The fourth Machine Learning for Health (ML4H) symposium was held in person on December 15th and 16th, 2024, in the traditional, ancestral, and unceded territories of the Musqueam, Squamish, and Tsleil-Waututh Nations in Vancouver, British Columbia, Canada. The symposium included research roundtable sessions to foster discussions between participants and senior researchers on timely and relevant topics for the ML4H community. The organization of the research roundtables at the conference involved 13 senior and 27 junior chairs across 13 tables. Each roundtable session included an invited senior chair (with substantial experience in the field), junior chairs (responsible for facilitating the discussion), and attendees from diverse backgrounds with an interest in the session's topic.
CVNov 17, 2025
SAGE: Spuriousness-Aware Guided Prompt Exploration for Mitigating Multimodal BiasWenqian Ye, Di Wang, Guangtao Zheng et al.
Large vision-language models, such as CLIP, have shown strong zero-shot classification performance by aligning images and text in a shared embedding space. However, CLIP models often develop multimodal spurious biases, which is the undesirable tendency to rely on spurious features. For example, CLIP may infer object types in images based on frequently co-occurring backgrounds rather than the object's core features. This bias significantly impairs the robustness of pre-trained CLIP models on out-of-distribution data, where such cross-modal associations no longer hold. Existing methods for mitigating multimodal spurious bias typically require fine-tuning on downstream data or prior knowledge of the bias, which undermines the out-of-the-box usability of CLIP. In this paper, we first theoretically analyze the impact of multimodal spurious bias in zero-shot classification. Based on this insight, we propose Spuriousness-Aware Guided Exploration (SAGE), a simple and effective method that mitigates spurious bias through guided prompt selection. SAGE requires no training, fine-tuning, or external annotations. It explores a space of prompt templates and selects the prompts that induce the largest semantic separation between classes, thereby improving worst-group robustness. Extensive experiments on four real-world benchmark datasets and five popular backbone models demonstrate that SAGE consistently improves zero-shot performance and generalization, outperforming previous zero-shot approaches without any external knowledge or model updates.
AIOct 21, 2025
Rectifying Shortcut Behaviors in Preference-based Reward LearningWenqian Ye, Guangtao Zheng, Aidong Zhang
In reinforcement learning from human feedback, preference-based reward models play a central role in aligning large language models to human-aligned behavior. However, recent studies show that these models are prone to reward hacking and often fail to generalize well due to over-optimization. They achieve high reward scores by exploiting shortcuts, that is, exploiting spurious features (e.g., response verbosity, agreeable tone, or sycophancy) that correlate with human preference labels in the training data rather than genuinely reflecting the intended objectives. In this paper, instead of probing these issues one at a time, we take a broader view of the reward hacking problem as shortcut behaviors and introduce a principled yet flexible approach to mitigate shortcut behaviors in preference-based reward learning. Inspired by the invariant theory in the kernel perspective, we propose Preference-based Reward Invariance for Shortcut Mitigation (PRISM), which learns group-invariant kernels with feature maps in a closed-form learning objective. Experimental results in several benchmarks show that our method consistently improves the accuracy of the reward model on diverse out-of-distribution tasks and reduces the dependency on shortcuts in downstream policy models, establishing a robust framework for preference-based alignment.
CVJun 15, 2024
Spuriousness-Aware Meta-Learning for Learning Robust ClassifiersGuangtao Zheng, Wenqian Ye, Aidong Zhang
Spurious correlations are brittle associations between certain attributes of inputs and target variables, such as the correlation between an image background and an object class. Deep image classifiers often leverage them for predictions, leading to poor generalization on the data where the correlations do not hold. Mitigating the impact of spurious correlations is crucial towards robust model generalization, but it often requires annotations of the spurious correlations in data -- a strong assumption in practice. In this paper, we propose a novel learning framework based on meta-learning, termed SPUME -- SPUriousness-aware MEta-learning, to train an image classifier to be robust to spurious correlations. We design the framework to iteratively detect and mitigate the spurious correlations that the classifier excessively relies on for predictions. To achieve this, we first propose to utilize a pre-trained vision-language model to extract text-format attributes from images. These attributes enable us to curate data with various class-attribute correlations, and we formulate a novel metric to measure the degree of these correlations' spuriousness. Then, to mitigate the reliance on spurious correlations, we propose a meta-learning strategy in which the support (training) sets and query (test) sets in tasks are curated with different spurious correlations that have high degrees of spuriousness. By meta-training the classifier on these spuriousness-aware meta-learning tasks, our classifier can learn to be invariant to the spurious correlations. We demonstrate that our method is robust to spurious correlations without knowing them a priori and achieves the best on five benchmark datasets with different robustness measures.
LGMay 6, 2024
Learning Robust Classifiers with Self-Guided Spurious Correlation MitigationGuangtao Zheng, Wenqian Ye, Aidong Zhang
Deep neural classifiers tend to rely on spurious correlations between spurious attributes of inputs and targets to make predictions, which could jeopardize their generalization capability. Training classifiers robust to spurious correlations typically relies on annotations of spurious correlations in data, which are often expensive to get. In this paper, we tackle an annotation-free setting and propose a self-guided spurious correlation mitigation framework. Our framework automatically constructs fine-grained training labels tailored for a classifier obtained with empirical risk minimization to improve its robustness against spurious correlations. The fine-grained training labels are formulated with different prediction behaviors of the classifier identified in a novel spuriousness embedding space. We construct the space with automatically detected conceptual attributes and a novel spuriousness metric which measures how likely a class-attribute correlation is exploited for predictions. We demonstrate that training the classifier to distinguish different prediction behaviors reduces its reliance on spurious correlations without knowing them a priori and outperforms prior methods on five real-world datasets.
CVMay 13, 2023
CEMFormer: Learning to Predict Driver Intentions from In-Cabin and External Cameras via Spatial-Temporal TransformersYunsheng Ma, Wenqian Ye, Xu Cao et al.
Driver intention prediction seeks to anticipate drivers' actions by analyzing their behaviors with respect to surrounding traffic environments. Existing approaches primarily focus on late-fusion techniques, and neglect the importance of maintaining consistency between predictions and prevailing driving contexts. In this paper, we introduce a new framework called Cross-View Episodic Memory Transformer (CEMFormer), which employs spatio-temporal transformers to learn unified memory representations for an improved driver intention prediction. Specifically, we develop a spatial-temporal encoder to integrate information from both in-cabin and external camera views, along with episodic memory representations to continuously fuse historical data. Furthermore, we propose a novel context-consistency loss that incorporates driving context as an auxiliary supervision signal to improve prediction performance. Comprehensive experiments on the Brain4Cars dataset demonstrate that CEMFormer consistently outperforms existing state-of-the-art methods in driver intention prediction.
CLOct 3, 2021
Adversarial Examples Generation for Reducing Implicit Gender Bias in Pre-trained ModelsWenqian Ye, Fei Xu, Yaojia Huang et al.
Over the last few years, Contextualized Pre-trained Neural Language Models, such as BERT, GPT, have shown significant gains in various NLP tasks. To enhance the robustness of existing pre-trained models, one way is adversarial examples generation and evaluation for conducting data augmentation or adversarial learning. In the meanwhile, gender bias embedded in the models seems to be a serious problem in practical applications. Many researches have covered the gender bias produced by word-level information(e.g. gender-stereotypical occupations), while few researchers have investigated the sentence-level cases and implicit cases. In this paper, we proposed a method to automatically generate implicit gender bias samples at sentence-level and a metric to measure gender bias. Samples generated by our method will be evaluated in terms of accuracy. The metric will be used to guide the generation of examples from Pre-trained models. Therefore, those examples could be used to impose attacks on Pre-trained Models. Finally, we discussed the evaluation efficacy of our generated examples on reducing gender bias for future research.