CLFeb 28, 2023
The 2022 NIST Language Recognition EvaluationYooyoung Lee, Craig Greenberg, Eliot Godard et al.
In 2022, the U.S. National Institute of Standards and Technology (NIST) conducted the latest Language Recognition Evaluation (LRE) in an ongoing series administered by NIST since 1996 to foster research in language recognition and to measure state-of-the-art technology. Similar to previous LREs, LRE22 focused on conversational telephone speech (CTS) and broadcast narrowband speech (BNBS) data. LRE22 also introduced new evaluation features, such as an emphasis on African languages, including low resource languages, and a test set consisting of segments containing between 3s and 35s of speech randomly sampled and extracted from longer recordings. A total of 21 research organizations, forming 16 teams, participated in this 3-month long evaluation and made a total of 65 valid system submissions to be evaluated. This paper presents an overview of LRE22 and an analysis of system performance over different evaluation conditions. The evaluation results suggest that Oromo and Tigrinya are easier to detect while Xhosa and Zulu are more challenging. A greater confusability is seen for some language pairs. When speech duration increased, system performance significantly increased up to a certain duration, and then a diminishing return on system performance is observed afterward.
AIOct 5, 2023
Causal Inference in Gene Regulatory Networks with GFlowNet: Towards Scalability in Large SystemsTrang Nguyen, Alexander Tong, Kanika Madan et al.
Understanding causal relationships within Gene Regulatory Networks (GRNs) is essential for unraveling the gene interactions in cellular processes. However, causal discovery in GRNs is a challenging problem for multiple reasons including the existence of cyclic feedback loops and uncertainty that yields diverse possible causal structures. Previous works in this area either ignore cyclic dynamics (assume acyclic structure) or struggle with scalability. We introduce Swift-DynGFN as a novel framework that enhances causal structure learning in GRNs while addressing scalability concerns. Specifically, Swift-DynGFN exploits gene-wise independence to boost parallelization and to lower computational cost. Experiments on real single-cell RNA velocity and synthetic GRN datasets showcase the advancement in learning causal structure in GRNs and scalability in larger systems.
AIFeb 21, 2023
Reusable Slotwise MechanismsTrang Nguyen, Amin Mansouri, Kanika Madan et al.
Agents with the ability to comprehend and reason about the dynamics of objects would be expected to exhibit improved robustness and generalization in novel scenarios. However, achieving this capability necessitates not only an effective scene representation but also an understanding of the mechanisms governing interactions among object subsets. Recent studies have made significant progress in representing scenes using object slots. In this work, we introduce Reusable Slotwise Mechanisms, or RSM, a framework that models object dynamics by leveraging communication among slots along with a modular architecture capable of dynamically selecting reusable mechanisms for predicting the future states of each object slot. Crucially, RSM leverages the Central Contextual Information (CCI), enabling selected mechanisms to access the remaining slots through a bottleneck, effectively allowing for modeling of higher order and complex interactions that might require a sparse subset of objects. Experimental results demonstrate the superior performance of RSM compared to state-of-the-art methods across various future prediction and related downstream tasks, including Visual Question Answering and action planning. Furthermore, we showcase RSM's Out-of-Distribution generalization ability to handle scenes in intricate scenarios.
MLOct 19, 2022
Fast Approximation of the Generalized Sliced-Wasserstein DistanceDung Le, Huy Nguyen, Khai Nguyen et al.
Generalized sliced Wasserstein distance is a variant of sliced Wasserstein distance that exploits the power of non-linear projection through a given defining function to better capture the complex structures of the probability distributions. Similar to sliced Wasserstein distance, generalized sliced Wasserstein is defined as an expectation over random projections which can be approximated by the Monte Carlo method. However, the complexity of that approximation can be expensive in high-dimensional settings. To that end, we propose to form deterministic and fast approximations of the generalized sliced Wasserstein distance by using the concentration of random projections when the defining functions are polynomial function, circular function, and neural network type function. Our approximations hinge upon an important result that one-dimensional projections of a high-dimensional random vector are approximately Gaussian.
CYSep 11, 2024
Safety challenges of AI in medicine in the era of large language modelsXiaoye Wang, Nicole Xi Zhang, Hongyu He et al.
Recent advancements in artificial intelligence (AI), particularly in large language models (LLMs), have unlocked significant potential to enhance the quality and efficiency of medical care. By introducing a novel way to interact with AI and data through natural language, LLMs offer new opportunities for medical practitioners, patients, and researchers. However, as AI and LLMs become more powerful and especially achieve superhuman performance in some medical tasks, public concerns over their safety have intensified. These concerns about AI safety have emerged as the most significant obstacles to the adoption of AI in medicine. In response, this review examines emerging risks in AI utilization during the LLM era. First, we explore LLM-specific safety challenges from functional and communication perspectives, addressing issues across data collection, model training, and real-world application. We then consider inherent safety problems shared by all AI systems, along with additional complications introduced by LLMs. Last, we discussed how safety issues of using AI in clinical practice and healthcare system operation would undermine trust among patient, clinicians and the public, and how to build confidence in these systems. By emphasizing the development of safe AI, we believe these technologies can be more rapidly and reliably integrated into everyday medical practice to benefit both patients and clinicians.
LGMay 23, 2024Code
Mixture of Experts Meets Prompt-Based Continual LearningMinh Le, An Nguyen, Huy Nguyen et al.
Exploiting the power of pre-trained models, prompt-based approaches stand out compared to other continual learning solutions in effectively preventing catastrophic forgetting, even with very few learnable parameters and without the need for a memory buffer. While existing prompt-based continual learning methods excel in leveraging prompts for state-of-the-art performance, they often lack a theoretical explanation for the effectiveness of prompting. This paper conducts a theoretical analysis to unravel how prompts bestow such advantages in continual learning, thus offering a new perspective on prompt design. We first show that the attention block of pre-trained models like Vision Transformers inherently encodes a special mixture of experts architecture, characterized by linear experts and quadratic gating score functions. This realization drives us to provide a novel view on prefix tuning, reframing it as the addition of new task-specific experts, thereby inspiring the design of a novel gating mechanism termed Non-linear Residual Gates (NoRGa). Through the incorporation of non-linear activation and residual connection, NoRGa enhances continual learning performance while preserving parameter efficiency. The effectiveness of NoRGa is substantiated both theoretically and empirically across diverse benchmarks and pretraining paradigms. Our code is publicly available at https://github.com/Minhchuyentoancbn/MoE_PromptCL
AIMay 7
Resolving the bias-precision paradox with stochastic causal representation learning for personalized medicinePeisong Zhang, Manqiang Peng, Yuxuan Wu et al.
Estimating individualized treatment effects from longitudinal observational data is central to data-driven medicine, yet existing methods face a fundamental limitation: reducing confounding bias often suppresses clinically informative heterogeneity, degrading patient-specific predictions. Here, we identify this tension as a bias-precision paradox in causal representation learning and introduce sampling-based maximum mean discrepancy (sMMD), a stochastic alignment strategy that replaces global adversarial balancing with subset-level matching. We instantiate this approach in a framework for counterfactual outcome prediction with attribution-grounded interpretability. Across two large-scale ICU cohorts (n = 27,783), our framework improves accuracy under distribution shift, reducing error by up to 11.5% and substantially increasing recall in high-risk tasks. Mechanistic analyses show that sMMD selectively preserves clinically decisive variables. In human-AI evaluation, our method outperforms clinicians-in-training and large language models, and improves clinician accuracy by 14.7% while reducing decision time, enabling interpretable, real-time clinical decision support.
LGFeb 3, 2025Code
A Wearable Device Dataset for Mental Health Assessment Using Laser Doppler Flowmetry and Fluorescence Spectroscopy SensorsMinh Ngoc Nguyen, Khai Le-Duc, Tan-Hanh Pham et al.
In this study, we introduce a novel method to predict mental health by building machine learning models for a non-invasive wearable device equipped with Laser Doppler Flowmetry (LDF) and Fluorescence Spectroscopy (FS) sensors. Besides, we present the corresponding dataset to predict mental health, e.g. depression, anxiety, and stress levels via the DAS-21 questionnaire. To our best knowledge, this is the world's largest and the most generalized dataset ever collected for both LDF and FS studies. The device captures cutaneous blood microcirculation parameters, and wavelet analysis of the LDF signal extracts key rhythmic oscillations. The dataset, collected from 132 volunteers aged 18-94 from 19 countries, explores relationships between physiological features, demographics, lifestyle habits, and health conditions. We employed a variety of machine learning methods to classify stress detection, in which LightGBM is identified as the most effective model for stress detection, achieving a ROC AUC of 0.7168 and a PR AUC of 0.8852. In addition, we also incorporated Explainable Artificial Intelligence (XAI) techniques into our analysis to investigate deeper insights into the model's predictions. Our results suggest that females, younger individuals and those with a higher Body Mass Index (BMI) or heart rate have a greater likelihood of experiencing mental health conditions like stress and anxiety. All related code and data are published online: https://github.com/leduckhai/Wearable_LDF-FS.
CVSep 14, 2025Code
Organoid Tracker: A SAM2-Powered Platform for Zero-shot Cyst Analysis in Human Kidney Organoid VideosXiaoyu Huang, Lauren M Maxson, Trang Nguyen et al.
Recent advances in organoid models have revolutionized the study of human kidney disease mechanisms and drug discovery by enabling scalable, cost-effective research without the need for animal sacrifice. Here, we present a kidney organoid platform optimized for efficient screening in polycystic kidney disease (PKD). While these systems generate rich spatial-temporal microscopy video datasets, current manual approaches to analysis remain limited to coarse classifications (e.g., hit vs. non-hit), often missing valuable pixel-level and longitudinal information. To help overcome this bottleneck, we developed Organoid Tracker, a graphical user interface (GUI) platform designed with a modular plugin architecture, which empowers researchers to extract detailed, quantitative metrics without programming expertise. Built on the cutting-edge vision foundation model Segment Anything Model 2 (SAM2), Organoid Tracker enables zero-shot segmentation and automated analysis of spatial-temporal microscopy videos. It quantifies key metrics such as cyst formation rate, growth velocity, and morphological changes, while generating comprehensive reports. By providing an extensible, open-source framework, Organoid Tracker offers a powerful solution for improving and accelerating research in kidney development, PKD modeling, and therapeutic discovery. The platform is publicly available as open-source software at https://github.com/hrlblab/OrganoidTracker.
AIOct 9, 2023
Causal Reasoning through Two Layers of Cognition for Improving Generalization in Visual Question AnsweringTrang Nguyen, Naoaki Okazaki
Generalization in Visual Question Answering (VQA) requires models to answer questions about images with contexts beyond the training distribution. Existing attempts primarily refine unimodal aspects, overlooking enhancements in multimodal aspects. Besides, diverse interpretations of the input lead to various modes of answer generation, highlighting the role of causal reasoning between interpreting and answering steps in VQA. Through this lens, we propose Cognitive pathways VQA (CopVQA) improving the multimodal predictions by emphasizing causal reasoning factors. CopVQA first operates a pool of pathways that capture diverse causal reasoning flows through interpreting and answering stages. Mirroring human cognition, we decompose the responsibility of each stage into distinct experts and a cognition-enabled component (CC). The two CCs strategically execute one expert for each stage at a time. Finally, we prioritize answer predictions governed by pathways involving both CCs while disregarding answers produced by either CC, thereby emphasizing causal reasoning and supporting generalization. Our experiments on real-life and medical data consistently verify that CopVQA improves VQA performance and generalization across baselines and domains. Notably, CopVQA achieves a new state-of-the-art (SOTA) on PathVQA dataset and comparable accuracy to the current SOTA on VQA-CPv2, VQAv2, and VQA RAD, with one-fourth of the model size.
CLDec 11, 2024
Adaptive Prompting for Continual Relation Extraction: A Within-Task Variance PerspectiveMinh Le, Tien Ngoc Luu, An Nguyen The et al.
To address catastrophic forgetting in Continual Relation Extraction (CRE), many current approaches rely on memory buffers to rehearse previously learned knowledge while acquiring new tasks. Recently, prompt-based methods have emerged as potent alternatives to rehearsal-based strategies, demonstrating strong empirical performance. However, upon analyzing existing prompt-based approaches for CRE, we identified several critical limitations, such as inaccurate prompt selection, inadequate mechanisms for mitigating forgetting in shared parameters, and suboptimal handling of cross-task and within-task variances. To overcome these challenges, we draw inspiration from the relationship between prefix-tuning and mixture of experts, proposing a novel approach that employs a prompt pool for each task, capturing variations within each task while enhancing cross-task variances. Furthermore, we incorporate a generative model to consolidate prior knowledge within shared parameters, eliminating the need for explicit data storage. Extensive experiments validate the efficacy of our approach, demonstrating superior performance over state-of-the-art prompt-based and rehearsal-free methods in continual relation extraction.
MLMay 23, 2024
Statistical Advantages of Perturbing Cosine Router in Mixture of ExpertsHuy Nguyen, Pedram Akbarian, Trang Pham et al.
The cosine router in Mixture of Experts (MoE) has recently emerged as an attractive alternative to the conventional linear router. Indeed, the cosine router demonstrates favorable performance in image and language tasks and exhibits better ability to mitigate the representation collapse issue, which often leads to parameter redundancy and limited representation potentials. Despite its empirical success, a comprehensive analysis of the cosine router in MoE has been lacking. Considering the least square estimation of the cosine routing MoE, we demonstrate that due to the intrinsic interaction of the model parameters in the cosine router via some partial differential equations, regardless of the structures of the experts, the estimation rates of experts and model parameters can be as slow as $\mathcal{O}(1/\log^τ(n))$ where $τ> 0$ is some constant and $n$ is the sample size. Surprisingly, these pessimistic non-polynomial convergence rates can be circumvented by the widely used technique in practice to stabilize the cosine router -- simply adding noises to the $\ell^2$-norms in the cosine router, which we refer to as \textit{perturbed cosine router}. Under the strongly identifiable settings of the expert functions, we prove that the estimation rates for both the experts and model parameters under the perturbed cosine routing MoE are significantly improved to polynomial rates. Finally, we conduct extensive simulation studies in both synthetic and real data settings to empirically validate our theoretical results.
CLMar 24, 2025
Masks and Mimicry: Strategic Obfuscation and Impersonation Attacks on Authorship VerificationKenneth Alperin, Rohan Leekha, Adaku Uchendu et al.
The increasing use of Artificial Intelligence (AI) technologies, such as Large Language Models (LLMs) has led to nontrivial improvements in various tasks, including accurate authorship identification of documents. However, while LLMs improve such defense techniques, they also simultaneously provide a vehicle for malicious actors to launch new attack vectors. To combat this security risk, we evaluate the adversarial robustness of authorship models (specifically an authorship verification model) to potent LLM-based attacks. These attacks include untargeted methods - \textit{authorship obfuscation} and targeted methods - \textit{authorship impersonation}. For both attacks, the objective is to mask or mimic the writing style of an author while preserving the original texts' semantics, respectively. Thus, we perturb an accurate authorship verification model, and achieve maximum attack success rates of 92\% and 78\% for both obfuscation and impersonation attacks, respectively.
CLOct 12, 2025
Large Language Models for Full-Text Methods Assessment: A Case Study on Mediation AnalysisWenqing Zhang, Trang Nguyen, Elizabeth A. Stuart et al.
Systematic reviews are crucial for synthesizing scientific evidence but remain labor-intensive, especially when extracting detailed methodological information. Large language models (LLMs) offer potential for automating methodological assessments, promising to transform evidence synthesis. Here, using causal mediation analysis as a representative methodological domain, we benchmarked state-of-the-art LLMs against expert human reviewers across 180 full-text scientific articles. Model performance closely correlated with human judgments (accuracy correlation 0.71; F1 correlation 0.97), achieving near-human accuracy on straightforward, explicitly stated methodological criteria. However, accuracy sharply declined on complex, inference-intensive assessments, lagging expert reviewers by up to 15%. Errors commonly resulted from superficial linguistic cues -- for instance, models frequently misinterpreted keywords like "longitudinal" or "sensitivity" as automatic evidence of rigorous methodological approache, leading to systematic misclassifications. Longer documents yielded lower model accuracy, whereas publication year showed no significant effect. Our findings highlight an important pattern for practitioners using LLMs for methods review and synthesis from full texts: current LLMs excel at identifying explicit methodological features but require human oversight for nuanced interpretations. Integrating automated information extraction with targeted expert review thus provides a promising approach to enhance efficiency and methodological rigor in evidence synthesis across diverse scientific fields.
LGSep 2, 2025
Challenges in Understanding Modality Conflict in Vision-Language ModelsTrang Nguyen, Jackson Michaels, Madalina Fiterau et al.
This paper highlights the challenge of decomposing conflict detection from conflict resolution in Vision-Language Models (VLMs) and presents potential approaches, including using a supervised metric via linear probes and group-based attention pattern analysis. We conduct a mechanistic investigation of LLaVA-OV-7B, a state-of-the-art VLM that exhibits diverse resolution behaviors when faced with conflicting multimodal inputs. Our results show that a linearly decodable conflict signal emerges in the model's intermediate layers and that attention patterns associated with conflict detection and resolution diverge at different stages of the network. These findings support the hypothesis that detection and resolution are functionally distinct mechanisms. We discuss how such decomposition enables more actionable interpretability and targeted interventions for improving model robustness in challenging multimodal settings.
CVAug 20, 2025
Human-like Content Analysis for Generative AI with Language-Grounded Sparse EncodersYiming Tang, Arash Lagzian, Srinivas Anumasa et al.
The rapid development of generative AI has transformed content creation, communication, and human development. However, this technology raises profound concerns in high-stakes domains, demanding rigorous methods to analyze and evaluate AI-generated content. While existing analytic methods often treat images as indivisible wholes, real-world AI failures generally manifest as specific visual patterns that can evade holistic detection and suit more granular and decomposed analysis. Here we introduce a content analysis tool, Language-Grounded Sparse Encoders (LanSE), which decompose images into interpretable visual patterns with natural language descriptions. Utilizing interpretability modules and large multimodal models, LanSE can automatically identify visual patterns within data modalities. Our method discovers more than 5,000 visual patterns with 93\% human agreement, provides decomposed evaluation outperforming existing methods, establishes the first systematic evaluation of physical plausibility, and extends to medical imaging settings. Our method's capability to extract language-grounded patterns can be naturally adapted to numerous fields, including biology and geography, as well as other data modalities such as protein structures and time series, thereby advancing content analysis for generative AI.
CYJun 26, 2025
Red Teaming for Generative AI, Report on a Copyright-Focused Exercise Completed in an Academic Medical CenterJames Wen, Sahil Nalawade, Zhiwei Liang et al. · deepmind, harvard
Background: Generative artificial intelligence (AI) deployment in academic medical settings raises copyright compliance concerns. Dana-Farber Cancer Institute implemented GPT4DFCI, an internal generative AI tool utilizing OpenAI models, that is approved for enterprise use in research and operations. Given (1) the exceptionally broad adoption of the tool in our organization, (2) our research mission, and (3) the shared responsibility model required to benefit from Customer Copyright Commitment in Azure OpenAI Service products, we deemed rigorous copyright compliance testing necessary. Case Description: We conducted a structured red teaming exercise in Nov. 2024, with 42 participants from academic, industry, and government institutions. Four teams attempted to extract copyrighted content from GPT4DFCI across four domains: literary works, news articles, scientific publications, and access-restricted clinical notes. Teams successfully extracted verbatim book dedications and near-exact passages through various strategies. News article extraction failed despite jailbreak attempts. Scientific article reproduction yielded only high-level summaries. Clinical note testing revealed appropriate privacy safeguards. Discussion: The successful extraction of literary content indicates potential copyrighted material presence in training data, necessitating inference-time filtering. Differential success rates across content types suggest varying protective mechanisms. The event led to implementation of a copyright-specific meta-prompt in GPT4DFCI; this mitigation has been in production since Jan. 2025. Conclusion: Systematic red teaming revealed specific vulnerabilities in generative AI copyright compliance, leading to concrete mitigation strategies. Academic medical institutions deploying generative AI should implement continuous testing protocols to ensure legal and ethical compliance.
AIApr 21, 2025
SuoiAI: Building a Dataset for Aquatic Invertebrates in VietnamTue Vo, Lakshay Sharma, Tuan Dinh et al.
Understanding and monitoring aquatic biodiversity is critical for ecological health and conservation efforts. This paper proposes SuoiAI, an end-to-end pipeline for building a dataset of aquatic invertebrates in Vietnam and employing machine learning (ML) techniques for species classification. We outline the methods for data collection, annotation, and model training, focusing on reducing annotation effort through semi-supervised learning and leveraging state-of-the-art object detection and classification models. Our approach aims to overcome challenges such as data scarcity, fine-grained classification, and deployment in diverse environmental conditions.
LGJun 28, 2024
Attack On Prompt: Backdoor Attack in Prompt-Based Continual LearningTrang Nguyen, Anh Tran, Nhat Ho
Prompt-based approaches offer a cutting-edge solution to data privacy issues in continual learning, particularly in scenarios involving multiple data suppliers where long-term storage of private user data is prohibited. Despite delivering state-of-the-art performance, its impressive remembering capability can become a double-edged sword, raising security concerns as it might inadvertently retain poisoned knowledge injected during learning from private user data. Following this insight, in this paper, we expose continual learning to a potential threat: backdoor attack, which drives the model to follow a desired adversarial target whenever a specific trigger is present while still performing normally on clean samples. We highlight three critical challenges in executing backdoor attacks on incremental learners and propose corresponding solutions: (1) \emph{Transferability}: We employ a surrogate dataset and manipulate prompt selection to transfer backdoor knowledge to data from other suppliers; (2) \emph{Resiliency}: We simulate static and dynamic states of the victim to ensure the backdoor trigger remains robust during intense incremental learning processes; and (3) \emph{Authenticity}: We apply binary cross-entropy loss as an anti-cheating factor to prevent the backdoor trigger from devolving into adversarial noise. Extensive experiments across various benchmark datasets and continual learners validate our continual backdoor framework, achieving up to $100\%$ attack success rate, with further ablation studies confirming our contributions' effectiveness.
CVMay 31, 2023
Spotlight Attention: Robust Object-Centric Learning With a Spatial Locality PriorAyush Chakravarthy, Trang Nguyen, Anirudh Goyal et al.
The aim of object-centric vision is to construct an explicit representation of the objects in a scene. This representation is obtained via a set of interchangeable modules called \emph{slots} or \emph{object files} that compete for local patches of an image. The competition has a weak inductive bias to preserve spatial continuity; consequently, one slot may claim patches scattered diffusely throughout the image. In contrast, the inductive bias of human vision is strong, to the degree that attention has classically been described with a spotlight metaphor. We incorporate a spatial-locality prior into state-of-the-art object-centric vision models and obtain significant improvements in segmenting objects in both synthetic and real-world datasets. Similar to human visual attention, the combination of image content and spatial constraints yield robust unsupervised object-centric learning, including less sensitivity to model hyperparameters.
LGOct 29, 2021
On Cross-Layer Alignment for Model Fusion of Heterogeneous Neural NetworksDang Nguyen, Trang Nguyen, Khai Nguyen et al.
Layer-wise model fusion via optimal transport, named OTFusion, applies soft neuron association for unifying different pre-trained networks to save computational resources. While enjoying its success, OTFusion requires the input networks to have the same number of layers. To address this issue, we propose a novel model fusion framework, named CLAFusion, to fuse neural networks with a different number of layers, which we refer to as heterogeneous neural networks, via cross-layer alignment. The cross-layer alignment problem, which is an unbalanced assignment problem, can be solved efficiently using dynamic programming. Based on the cross-layer alignment, our framework balances the number of layers of neural networks before applying layer-wise model fusion. Our experiments indicate that CLAFusion, with an extra finetuning process, improves the accuracy of residual networks on the CIFAR10, CIFAR100, and Tiny-ImageNet datasets. Furthermore, we explore its practical usage for model compression and knowledge distillation when applying to the teacher-student setting.