Linda Petzold

h-index68

37papers

2,121citations

Novelty40%

AI Score41

Ranked #66,879 of 194,257 authors (top 34%)#12,984 in CL (top 42%)

37 Papers

17.6CLOct 24, 2023Code

A Survey on Detection of LLMs-Generated Content

Xianjun Yang, Liangming Pan, Xuandong Zhao et al. · berkeley

The burgeoning capabilities of advanced large language models (LLMs) such as ChatGPT have led to an increase in synthetic content generation with implications across a variety of sectors, including media, cybersecurity, public discourse, and education. As such, the ability to detect LLMs-generated content has become of paramount importance. We aim to provide a detailed overview of existing detection strategies and benchmarks, scrutinizing their differences and identifying key challenges and prospects in the field, advocating for more adaptable and robust models to enhance detection accuracy. We also posit the necessity for a multi-faceted approach to defend against various attacks to counter the rapidly advancing capabilities of LLMs. To the best of our knowledge, this work is the first comprehensive survey on the detection in the era of LLMs. We hope it will provide a broad understanding of the current landscape of LLMs-generated content detection, offering a guiding reference for researchers and practitioners striving to uphold the integrity of digital information in an era increasingly dominated by synthetic content. The relevant papers are summarized and will be consistently updated at https://github.com/Xianjun-Yang/Awesome_papers_on_LLMs_detection.git.

4.9CLMar 6, 2023Code

Dynamic Prompting: A Unified Framework for Prompt Tuning

Xianjun Yang, Wei Cheng, Xujiang Zhao et al.

It has been demonstrated that the art of prompt tuning is highly effective in efficiently extracting knowledge from pretrained foundation models, encompassing pretrained language models (PLMs), vision pretrained models, and vision-language (V-L) models. However, the efficacy of employing fixed soft prompts with a predetermined position for concatenation with inputs for all instances, irrespective of their inherent disparities, remains uncertain. Variables such as the position, length, and representations of prompts across diverse instances and tasks can substantially influence the performance of prompt tuning. In this context, we provide a theoretical analysis, which reveals that optimizing the position of the prompt to encompass the input can capture additional semantic information that traditional prefix or postfix prompt tuning methods fail to capture. Building upon our analysis, we present a unified dynamic prompt (DP) tuning strategy that dynamically determines different factors of prompts based on specific tasks and instances. To accomplish this, we employ a lightweight learning network with Gumble-Softmax, allowing us to learn instance-dependent guidance. Experimental results underscore the significant performance improvement achieved by dynamic prompt tuning across a wide range of tasks, including NLP tasks, vision recognition tasks, and vision-language tasks. Furthermore, we establish the universal applicability of our approach under full-data, few-shot, and multitask scenarios. Codes are available at https://github.com/Xianjun-Yang/DPT.

27.4CLOct 4, 2023Code

Shadow Alignment: The Ease of Subverting Safely-Aligned Language Models

Xianjun Yang, Xiao Wang, Qi Zhang et al.

Warning: This paper contains examples of harmful language, and reader discretion is recommended. The increasing open release of powerful large language models (LLMs) has facilitated the development of downstream applications by reducing the essential cost of data annotation and computation. To ensure AI safety, extensive safety-alignment measures have been conducted to armor these models against malicious use (primarily hard prompt attack). However, beneath the seemingly resilient facade of the armor, there might lurk a shadow. By simply tuning on 100 malicious examples with 1 GPU hour, these safely aligned LLMs can be easily subverted to generate harmful content. Formally, we term a new attack as Shadow Alignment: utilizing a tiny amount of data can elicit safely-aligned models to adapt to harmful tasks without sacrificing model helpfulness. Remarkably, the subverted models retain their capability to respond appropriately to regular inquiries. Experiments across 8 models released by 5 different organizations (LLaMa-2, Falcon, InternLM, BaiChuan2, Vicuna) demonstrate the effectiveness of shadow alignment attack. Besides, the single-turn English-only attack successfully transfers to multi-turn dialogue and other languages. This study serves as a clarion call for a collective effort to overhaul and fortify the safety of open-source LLMs against malicious attackers.

22.4CLDec 19, 2022Code

OASum: Large-Scale Open Domain Aspect-based Summarization

Xianjun Yang, Kaiqiang Song, Sangwoo Cho et al. · tencent-ai

Aspect or query-based summarization has recently caught more attention, as it can generate differentiated summaries based on users' interests. However, the current dataset for aspect or query-based summarization either focuses on specific domains, contains relatively small-scale instances, or includes only a few aspect types. Such limitations hinder further explorations in this direction. In this work, we take advantage of crowd-sourcing knowledge on Wikipedia.org and automatically create a high-quality, large-scale open-domain aspect-based summarization dataset named OASum, which contains more than 3.7 million instances with around 1 million different aspects on 2 million Wikipedia pages. We provide benchmark results on OASum and demonstrate its ability for diverse aspect-based summarization generation. To overcome the data scarcity problem on specific domains, we also perform zero-shot, few-shot, and fine-tuning on seven downstream datasets. Specifically, zero/few-shot and fine-tuning results show that the model pre-trained on our corpus demonstrates a strong aspect or query-focused generation ability compared with the backbone model. Our dataset and pre-trained checkpoints are publicly available.

14.6LGMar 28, 2022

Integrating Physiological Time Series and Clinical Notes with Transformer for Early Prediction of Sepsis

Yuqing Wang, Yun Zhao, Rachael Callcut et al. · stanford

Sepsis is a leading cause of death in the Intensive Care Units (ICU). Early detection of sepsis is critical for patient survival. In this paper, we propose a multimodal Transformer model for early sepsis prediction, using the physiological time series data and clinical notes for each patient within $36$ hours of ICU admission. Specifically, we aim to predict sepsis using only the first 12, 18, 24, 30 and 36 hours of laboratory measurements, vital signs, patient demographics, and clinical notes. We evaluate our model on two large critical care datasets: MIMIC-III and eICU-CRD. The proposed method is compared with six baselines. In addition, ablation analysis and case studies are conducted to study the influence of each individual component of the model and the contribution of each data modality for early sepsis prediction. Experimental results demonstrate the effectiveness of our method, which outperforms competitive baselines on all metrics.

16.7CLOct 23, 2023Code

AlpaCare:Instruction-tuned Large Language Models for Medical Application

Xinlu Zhang, Chenxin Tian, Xianjun Yang et al.

Instruction-finetuning (IFT) has become crucial in aligning Large Language Models (LLMs) with diverse human needs and has shown great potential in medical applications. However, previous studies mainly fine-tune LLMs on biomedical datasets with limited diversity, which often rely on benchmarks or narrow task scopes, and hence significantly limit the effectiveness on their medical instruction-following ability and generalizability. To bridge this gap, we propose creating a diverse, machine-generated medical IFT dataset, MedInstruct-52k, using GPT-4 and ChatGPT with a high-quality expert-curated seed set. We then fine-tune LLaMA-series models on the dataset to develop AlpaCare. Despite using a smaller domain-specific dataset than previous medical LLMs, AlpaCare not only demonstrates superior performance on medical applications, with up to 38.1% absolute gain over best baselines in medical free-form instruction evaluations, but also achieves 6.7% absolute gains averaged over multiple general domain benchmarks. Human evaluation further shows that AlpaCare consistently outperforms best baselines in terms of both correctness and helpfulness. We offer public access to our data, model, and codebase in https://github.com/XZhang97666/AlpaCare.

11.1CLApr 9, 2023Code

Are Large Language Models Ready for Healthcare? A Comparative Study on Clinical Language Understanding

Yuqing Wang, Yun Zhao, Linda Petzold · stanford

Large language models (LLMs) have made significant progress in various domains, including healthcare. However, the specialized nature of clinical language understanding tasks presents unique challenges and limitations that warrant further investigation. In this study, we conduct a comprehensive evaluation of state-of-the-art LLMs, namely GPT-3.5, GPT-4, and Bard, within the realm of clinical language understanding tasks. These tasks span a diverse range, including named entity recognition, relation extraction, natural language inference, semantic textual similarity, document classification, and question-answering. We also introduce a novel prompting strategy, self-questioning prompting (SQP), tailored to enhance LLMs' performance by eliciting informative questions and answers pertinent to the clinical scenarios at hand. Our evaluation underscores the significance of task-specific learning strategies and prompting techniques for improving LLMs' effectiveness in healthcare-related tasks. Additionally, our in-depth error analysis on the challenging relation extraction task offers valuable insights into error distribution and potential avenues for improvement using SQP. Our study sheds light on the practical implications of employing LLMs in the specialized domain of healthcare, serving as a foundation for future research and the development of potential applications in healthcare settings.

11.2CLJul 6, 2024Code

MMSci: A Dataset for Graduate-Level Multi-Discipline Multimodal Scientific Understanding

Zekun Li, Xianjun Yang, Kyuri Choi et al.

Scientific figure interpretation is a crucial capability for AI-driven scientific assistants built on advanced Large Vision Language Models. However, current datasets and benchmarks primarily focus on simple charts or other relatively straightforward figures from limited science domains. To address this gap, we present a comprehensive dataset compiled from peer-reviewed Nature Communications articles covering 72 scientific fields, encompassing complex visualizations such as schematic diagrams, microscopic images, and experimental data which require graduate-level expertise to interpret. We evaluated 19 proprietary and open-source models on two benchmark tasks, figure captioning and multiple-choice, and conducted human expert annotation. Our analysis revealed significant task challenges and performance gaps among models. Beyond serving as a benchmark, this dataset serves as a valuable resource for large-scale training. Fine-tuning Qwen2-VL-7B with our task-specific data achieved better performance than GPT-4o and even human experts in multiple-choice evaluations. Furthermore, continuous pre-training on our interleaved article and figure data substantially enhanced the model's downstream task performance in materials science. We have released our dataset to support further research.

20.8LGOct 18, 2022Code

Improving Medical Predictions by Irregular Multimodal Electronic Health Records Modeling

Xinlu Zhang, Shiyang Li, Zhiyu Chen et al.

Health conditions among patients in intensive care units (ICUs) are monitored via electronic health records (EHRs), composed of numerical time series and lengthy clinical note sequences, both taken at irregular time intervals. Dealing with such irregularity in every modality, and integrating irregularity into multimodal representations to improve medical predictions, is a challenging problem. Our method first addresses irregularity in each single modality by (1) modeling irregular time series by dynamically incorporating hand-crafted imputation embeddings into learned interpolation embeddings via a gating mechanism, and (2) casting a series of clinical note representations as multivariate irregular time series and tackling irregularity via a time attention mechanism. We further integrate irregularity in multimodal fusion with an interleaved attention mechanism across temporal steps. To the best of our knowledge, this is the first work to thoroughly model irregularity in multimodalities for improving medical predictions. Our proposed methods for two medical prediction tasks consistently outperforms state-of-the-art (SOTA) baselines in each single modality and multimodal fusion scenarios. Specifically, we observe relative improvements of 6.5\%, 3.6\%, and 4.3\% in F1 for time series, clinical notes, and multimodal fusion, respectively. These results demonstrate the effectiveness of our methods and the importance of considering irregularity in multimodal EHRs.

9.6LGSep 28, 2022Code

VREN: Volleyball Rally Dataset with Expression Notation Language

Haotian Xia, Rhys Tracy, Yun Zhao et al.

This research is intended to accomplish two goals: The first goal is to curate a large and information rich dataset that contains crucial and succinct summaries on the players' actions and positions and the back-and-forth travel patterns of the volleyball in professional and NCAA Div-I indoor volleyball games. While several prior studies have aimed to create similar datasets for other sports (e.g. badminton and soccer), creating such a dataset for indoor volleyball is not yet realized. The second goal is to introduce a volleyball descriptive language to fully describe the rally processes in the games and apply the language to our dataset. Based on the curated dataset and our descriptive sports language, we introduce three tasks for automated volleyball action and tactic analysis using our dataset: (1) Volleyball Rally Prediction, aimed at predicting the outcome of a rally and helping players and coaches improve decision-making in practice, (2) Setting Type and Hitting Type Prediction, to help coaches and players prepare more effectively for the game, and (3) Volleyball Tactics and Attacking Zone Statistics, to provide advanced volleyball statistics and help coaches understand the game and opponent's tactics better. We conducted case studies to show how experimental results can provide insights to the volleyball analysis community. Furthermore, experimental evaluation based on real-world data establishes a baseline for future studies and applications of our dataset and language. This study bridges the gap between the indoor volleyball field and computer science. The dataset is available at: https://github.com/haotianxia/VREN.

1.3CLFeb 11, 2023Code

MatKB: Semantic Search for Polycrystalline Materials Synthesis Procedures

Xianjun Yang, Stephen Wilson, Linda Petzold

In this paper, we present a novel approach to knowledge extraction and retrieval using Natural Language Processing (NLP) techniques for material science. Our goal is to automatically mine structured knowledge from millions of research articles in the field of polycrystalline materials and make it easily accessible to the broader community. The proposed method leverages NLP techniques such as entity recognition and document classification to extract relevant information and build an extensive knowledge base, from a collection of 9.5 Million publications. The resulting knowledge base is integrated into a search engine, which enables users to search for information about specific materials, properties, and experiments with greater precision than traditional search engines like Google. We hope our results can enable material scientists quickly locate desired experimental procedures, compare their differences, and even inspire them to design new experiments. Our website will be available at Github \footnote{https://github.com/Xianjun-Yang/PcMSP.git} soon.

5.8LGJun 26, 2022

Predicting the Need for Blood Transfusion in Intensive Care Units with Reinforcement Learning

Yuqing Wang, Yun Zhao, Linda Petzold · stanford

As critically ill patients frequently develop anemia or coagulopathy, transfusion of blood products is a frequent intervention in the Intensive Care Units (ICU). However, inappropriate transfusion decisions made by physicians are often associated with increased risk of complications and higher hospital costs. In this work, we aim to develop a decision support tool that uses available patient information for transfusion decision-making on three common blood products (red blood cells, platelets, and fresh frozen plasma). To this end, we adopt an off-policy batch reinforcement learning (RL) algorithm, namely, discretized Batch Constrained Q-learning, to determine the best action (transfusion or not) given observed patient trajectories. Simultaneously, we consider different state representation approaches and reward design mechanisms to evaluate their impacts on policy learning. Experiments are conducted on two real-world critical care datasets: the MIMIC-III and the UCSF. Results demonstrate that policy recommendations on transfusion achieved comparable matching against true hospital policies via accuracy and weighted importance sampling evaluations on the MIMIC-III dataset. Furthermore, a combination of transfer learning (TL) and RL on the data-scarce UCSF dataset can provide up to $17.02% improvement in terms of accuracy, and up to 18.94% and 21.63% improvement in jump-start and asymptotic performance in terms of weighted importance sampling averaged over three transfusion tasks. Finally, simulations on transfusion decisions suggest that the transferred RL policy could reduce patients' estimated 28-day mortality rate by 2.74% and decreased acuity rate by 1.18% on the UCSF dataset.

6.9LGMar 28, 2022

Enhancing Transformer Efficiency for Multivariate Time Series Classification

Yuqing Wang, Yun Zhao, Linda Petzold · stanford

Most current multivariate time series (MTS) classification algorithms focus on improving the predictive accuracy. However, for large-scale (either high-dimensional or long-sequential) time series (TS) datasets, there is an additional consideration: to design an efficient network architecture to reduce computational costs such as training time and memory footprint. In this work we propose a methodology based on module-wise pruning and Pareto analysis to investigate the relationship between model efficiency and accuracy, as well as its complexity. Comprehensive experiments on benchmark MTS datasets illustrate the effectiveness of our method.

31.2CVNov 2, 2023

GPT-4V(ision) as a Generalist Evaluator for Vision-Language Tasks

Xinlu Zhang, Yujie Lu, Weizhi Wang et al.

Automatically evaluating vision-language tasks is challenging, especially when it comes to reflecting human judgments due to limitations in accounting for fine-grained details. Although GPT-4V has shown promising results in various multi-modal tasks, leveraging GPT-4V as a generalist evaluator for these tasks has not yet been systematically explored. We comprehensively validate GPT-4V's capabilities for evaluation purposes, addressing tasks ranging from foundational image-to-text and text-to-image synthesis to high-level image-to-image translations and multi-images to text alignment. We employ two evaluation methods, single-answer grading and pairwise comparison, using GPT-4V. Notably, GPT-4V shows promising agreement with humans across various tasks and evaluation methods, demonstrating immense potential for multi-modal LLMs as evaluators. Despite limitations like restricted visual clarity grading and real-world complex reasoning, its ability to provide human-aligned scores enriched with detailed explanations is promising for universal automatic evaluator.

6.3CLOct 8, 2023Code

Zero-Shot Detection of Machine-Generated Codes

Xianjun Yang, Kexun Zhang, Haifeng Chen et al.

This work proposes a training-free approach for the detection of LLMs-generated codes, mitigating the risks associated with their indiscriminate usage. To the best of our knowledge, our research is the first to investigate zero-shot detection techniques applied to code generated by advanced black-box LLMs like ChatGPT. Firstly, we find that existing training-based or zero-shot text detectors are ineffective in detecting code, likely due to the unique statistical properties found in code structures. We then modify the previous zero-shot text detection method, DetectGPT (Mitchell et al., 2023) by utilizing a surrogate white-box model to estimate the probability of the rightmost tokens, allowing us to identify code snippets generated by language models. Through extensive experiments conducted on the python codes of the CodeContest and APPS dataset, our approach demonstrates its effectiveness by achieving state-of-the-art detection results on text-davinci-003, GPT-3.5, and GPT-4 models. Moreover, our method exhibits robustness against revision attacks and generalizes well to Java codes. We also find that the smaller code language model like PolyCoder-160M performs as a universal code detector, outperforming the billion-scale counterpart. The codes will be available at https://github.com/ Xianjun-Yang/Code_detection.git

21.7CLSep 6, 2022Code

Few-Shot Document-Level Event Argument Extraction

Xianjun Yang, Yujie Lu, Linda Petzold

Event argument extraction (EAE) has been well studied at the sentence level but under-explored at the document level. In this paper, we study to capture event arguments that actually spread across sentences in documents. Prior works usually assume full access to rich document supervision, ignoring the fact that the available argument annotation is usually limited. To fill this gap, we present FewDocAE, a Few-Shot Document-Level Event Argument Extraction benchmark, based on the existing document-level event extraction dataset. We first define the new problem and reconstruct the corpus by a novel N -Way-D-Doc sampling instead of the traditional N -Way-K-Shot strategy. Then we adjust the current document-level neural models into the few-shot setting to provide baseline results under in- and cross-domain settings. Since the argument extraction depends on the context from multiple sentences and the learning process is limited to very few examples, we find this novel task to be very challenging with substantively low performance. Considering FewDocAE is closely related to practical use under low-resource regimes, we hope this benchmark encourages more research in this direction. Our data and codes will be available online.

17.3LGAug 9, 2022

Interpretable Polynomial Neural Ordinary Differential Equations

Colby Fronk, Linda Petzold

Neural networks have the ability to serve as universal function approximators, but they are not interpretable and don't generalize well outside of their training region. Both of these issues are problematic when trying to apply standard neural ordinary differential equations (neural ODEs) to dynamical systems. We introduce the polynomial neural ODE, which is a deep polynomial neural network inside of the neural ODE framework. We demonstrate the capability of polynomial neural ODEs to predict outside of the training region, as well as perform direct symbolic regression without additional tools such as SINDy.

24.1CLOct 22, 2022Code

PcMSP: A Dataset for Scientific Action Graphs Extraction from Polycrystalline Materials Synthesis Procedure Text

Xianjun Yang, Ya Zhuo, Julia Zuo et al.

Scientific action graphs extraction from materials synthesis procedures is important for reproducible research, machine automation, and material prediction. But the lack of annotated data has hindered progress in this field. We demonstrate an effort to annotate Polycrystalline Materials Synthesis Procedures (PcMSP) from 305 open access scientific articles for the construction of synthesis action graphs. This is a new dataset for material science information extraction that simultaneously contains the synthesis sentences extracted from the experimental paragraphs, as well as the entity mentions and intra-sentence relations. A two-step human annotation and inter-annotator agreement study guarantee the high quality of the PcMSP corpus. We introduce four natural language processing tasks: sentence classification, named entity recognition, relation classification, and joint extraction of entities and relations. Comprehensive experiments validate the effectiveness of several state-of-the-art models for these challenges while leaving large space for improvement. We also perform the error analysis and point out some unique challenges that require further investigation. We will release our annotation scheme, the corpus, and codes to the research community to alleviate the scarcity of labeled data in this domain.

21.4CLMay 2, 2024Code

A Survey on Large Language Models for Critical Societal Domains: Finance, Healthcare, and Law

Zhiyu Zoey Chen, Jing Ma, Xinlu Zhang et al.

In the fast-evolving domain of artificial intelligence, large language models (LLMs) such as GPT-3 and GPT-4 are revolutionizing the landscapes of finance, healthcare, and law: domains characterized by their reliance on professional expertise, challenging data acquisition, high-stakes, and stringent regulatory compliance. This survey offers a detailed exploration of the methodologies, applications, challenges, and forward-looking opportunities of LLMs within these high-stakes sectors. We highlight the instrumental role of LLMs in enhancing diagnostic and treatment methodologies in healthcare, innovating financial analytics, and refining legal interpretation and compliance strategies. Moreover, we critically examine the ethics for LLM applications in these fields, pointing out the existing ethical concerns and the need for transparent, fair, and robust AI systems that respect regulatory norms. By presenting a thorough review of current literature and practical applications, we showcase the transformative impact of LLMs, and outline the imperative for interdisciplinary cooperation, methodological advancements, and ethical vigilance. Through this lens, we aim to spark dialogue and inspire future research dedicated to maximizing the benefits of LLMs while mitigating their risks in these precision-dependent sectors. To facilitate future research on LLMs in these critical societal domains, we also initiate a reading list that tracks the latest advancements under this topic, which will be continually updated: \url{https://github.com/czyssrs/LLM_X_papers}.

7.7LGAug 17, 2023

Bayesian polynomial neural networks and polynomial neural ordinary differential equations

Colby Fronk, Jaewoong Yun, Prashant Singh et al.

Symbolic regression with polynomial neural networks and polynomial neural ordinary differential equations (ODEs) are two recent and powerful approaches for equation recovery of many science and engineering problems. However, these methods provide point estimates for the model parameters and are currently unable to accommodate noisy data. We address this challenge by developing and validating the following Bayesian inference methods: the Laplace approximation, Markov Chain Monte Carlo (MCMC) sampling methods, and variational inference. We have found the Laplace approximation to be the best method for this class of problems. Our work can be easily extended to the broader class of symbolic neural networks to which the polynomial neural network belongs.

1.9CLJan 2, 2024Code

Quokka: An Open-source Large Language Model ChatBot for Material Science

Xianjun Yang, Stephen D. Wilson, Linda Petzold

This paper presents the development of a specialized chatbot for materials science, leveraging the Llama-2 language model, and continuing pre-training on the expansive research articles in the materials science domain from the S2ORC dataset. The methodology involves an initial pretraining phase on over one million domain-specific papers, followed by an instruction-tuning process to refine the chatbot's capabilities. The chatbot is designed to assist researchers, educators, and students by providing instant, context-aware responses to queries in the field of materials science. We make the four trained checkpoints (7B, 13B, with or without chat ability) freely available to the research community at https://github.com/Xianjun-Yang/Quokka.

19.6CLMay 27, 2023Code

DNA-GPT: Divergent N-Gram Analysis for Training-Free Detection of GPT-Generated Text

Xianjun Yang, Wei Cheng, Yue Wu et al.

Large language models (LLMs) have notably enhanced the fluency and diversity of machine-generated text. However, this progress also presents a significant challenge in detecting the origin of a given text, and current research on detection methods lags behind the rapid evolution of LLMs. Conventional training-based methods have limitations in flexibility, particularly when adapting to new domains, and they often lack explanatory power. To address this gap, we propose a novel training-free detection strategy called Divergent N-Gram Analysis (DNA-GPT). Given a text, we first truncate it in the middle and then use only the preceding portion as input to the LLMs to regenerate the new remaining parts. By analyzing the differences between the original and new remaining parts through N-gram analysis in black-box or probability divergence in white-box, we unveil significant discrepancies between the distribution of machine-generated text and the distribution of human-written text. We conducted extensive experiments on the most advanced LLMs from OpenAI, including text-davinci-003, GPT-3.5-turbo, and GPT-4, as well as open-source models such as GPT-NeoX-20B and LLaMa-13B. Results show that our zero-shot approach exhibits state-of-the-art performance in distinguishing between human and GPT-generated text on four English and one German dataset, outperforming OpenAI's own classifier, which is trained on millions of text. Additionally, our methods provide reasonable explanations and evidence to support our claim, which is a unique feature of explainable detection. Our method is also robust under the revised text attack and can additionally solve model sourcing. Codes are available at https://github.com/Xianjun-Yang/DNA-GPT.

3.9CLMay 22, 2023Code

Enhancing Small Medical Learners with Privacy-preserving Contextual Prompting

Xinlu Zhang, Shiyang Li, Xianjun Yang et al.

Large language models (LLMs) demonstrate remarkable medical expertise, but data privacy concerns impede their direct use in healthcare environments. Although offering improved data privacy protection, domain-specific small language models (SLMs) often underperform LLMs, emphasizing the need for methods that reduce this performance gap while alleviating privacy concerns. In this paper, we present a simple yet effective method that harnesses LLMs' medical proficiency to boost SLM performance in medical tasks under privacy-restricted scenarios. Specifically, we mitigate patient privacy issues by extracting keywords from medical data and prompting the LLM to generate a medical knowledge-intensive context by simulating clinicians' thought processes. This context serves as additional input for SLMs, augmenting their decision-making capabilities. Our method significantly enhances performance in both few-shot and full training settings across three medical knowledge-intensive tasks, achieving up to a 22.57% increase in absolute accuracy compared to SLM fine-tuning without context, and sets new state-of-the-art results in two medical tasks within privacy-restricted scenarios. Further out-of-domain testing and experiments in two general domain datasets showcase its generalizability and broad applicability. Our code can be found at https://github.com/XZhang97666/PrivacyBoost-SLM.

5.1NADec 2, 2024

Training Stiff Neural Ordinary Differential Equations with Explicit Exponential Integration Methods

Colby Fronk, Linda Petzold

Stiff ordinary differential equations (ODEs) are common in many science and engineering fields, but standard neural ODE approaches struggle to accurately learn these stiff systems, posing a significant barrier to widespread adoption of neural ODEs. In our earlier work, we addressed this challenge by utilizing single-step implicit methods for solving stiff neural ODEs. While effective, these implicit methods are computationally costly and can be complex to implement. This paper expands on our earlier work by exploring explicit exponential integration methods as a more efficient alternative. We evaluate the potential of these explicit methods to handle stiff dynamics in neural ODEs, aiming to enhance their applicability to a broader range of scientific and engineering problems. We found the integrating factor Euler (IF Euler) method to excel in stability and efficiency. While implicit schemes failed to train the stiff Van der Pol oscillator, the IF Euler method succeeded, even with large step sizes. However, IF Euler's first-order accuracy limits its use, leaving the development of higher-order methods for stiff neural ODEs an open research problem.

7.1LGAug 2, 2025

The Vanishing Gradient Problem for Stiff Neural Differential Equations

Colby Fronk, Linda Petzold

Gradient-based optimization of neural differential equations and other parameterized dynamical systems fundamentally relies on the ability to differentiate numerical solutions with respect to model parameters. In stiff systems, it has been observed that sensitivities to parameters controlling fast-decaying modes become vanishingly small during training, leading to optimization difficulties. In this paper, we show that this vanishing gradient phenomenon is not an artifact of any particular method, but a universal feature of all A-stable and L-stable stiff numerical integration schemes. We analyze the rational stability function for general stiff integration schemes and demonstrate that the relevant parameter sensitivities, governed by the derivative of the stability function, decay to zero for large stiffness. Explicit formulas for common stiff integration schemes are provided, which illustrate the mechanism in detail. Finally, we rigorously prove that the slowest possible rate of decay for the derivative of the stability function is $O(|z|^{-1})$, revealing a fundamental limitation: all A-stable time-stepping methods inevitably suppress parameter gradients in stiff regimes, posing a significant barrier for training and parameter identification in stiff neural ODEs.

16.8CVMay 10, 2023

An Empirical Study on the Robustness of the Segment Anything Model (SAM)

Yuqing Wang, Yun Zhao, Linda Petzold

The Segment Anything Model (SAM) is a foundation model for general image segmentation. Although it exhibits impressive performance predominantly on natural images, understanding its robustness against various image perturbations and domains is critical for real-world applications where such challenges frequently arise. In this study we conduct a comprehensive robustness investigation of SAM under diverse real-world conditions. Our experiments encompass a wide range of image perturbations. Our experimental results demonstrate that SAM's performance generally declines under perturbed images, with varying degrees of vulnerability across different perturbations. By customizing prompting techniques and leveraging domain knowledge based on the unique characteristics of each dataset, the model's resilience to these perturbations can be enhanced, addressing dataset-specific challenges. This work sheds light on the limitations and strengths of SAM in real-world applications, promoting the development of more robust and versatile image segmentation solutions.

9.9LGOct 1, 2021

Empirical Quantitative Analysis of COVID-19 Forecasting Models

Yun Zhao, Yuqing Wang, Junfeng Liu et al.

COVID-19 has been a public health emergency of international concern since early 2020. Reliable forecasting is critical to diminish the impact of this disease. To date, a large number of different forecasting models have been proposed, mainly including statistical models, compartmental models, and deep learning models. However, due to various uncertain factors across different regions such as economics and government policy, no forecasting model appears to be the best for all scenarios. In this paper, we perform quantitative analysis of COVID-19 forecasting of confirmed cases and deaths across different regions in the United States with different forecasting horizons, and evaluate the relative impacts of the following three dimensions on the predictive performance (improvement and variation) through different evaluation metrics: model selection, hyperparameter tuning, and the length of time series required for training. We find that if a dimension brings about higher performance gains, if not well-tuned, it may also lead to harsher performance penalties. Furthermore, model selection is the dominant factor in determining the predictive performance. It is responsible for both the largest improvement and the largest variation in performance in all prediction tasks across different regions. While practitioners may perform more complicated time series analysis in practice, they should be able to achieve reasonable results if they have adequate insight into key decisions like model selection.

1.6LGJun 22, 2021

Multiple Organ Failure Prediction with Classifier-Guided Generative Adversarial Imputation Networks

Xinlu Zhang, Yun Zhao, Rachael Callcut et al.

Multiple organ failure (MOF) is a severe syndrome with a high mortality rate among Intensive Care Unit (ICU) patients. Early and precise detection is critical for clinicians to make timely decisions. An essential challenge in applying machine learning models to electronic health records (EHRs) is the pervasiveness of missing values. Most existing imputation methods are involved in the data preprocessing phase, failing to capture the relationship between data and outcome for downstream predictions. In this paper, we propose classifier-guided generative adversarial imputation networks Classifier-GAIN) for MOF prediction to bridge this gap, by incorporating both observed data and label information. Specifically, the classifier takes imputed values from the generator(imputer) to predict task outcomes and provides additional supervision signals to the generator by joint training. The classifier-guide generator imputes missing values with label-awareness during training, improving the classifier's performance during inference. We conduct extensive experiments showing that our approach consistently outperforms classical and state-of-art neural baselines across a range of missing data scenarios and evaluation metrics.

3.1LGMar 19, 2021

Empirical Analysis of Machine Learning Configurations for Prediction of Multiple Organ Failure in Trauma Patients

Yuqing Wang, Yun Zhao, Rachael Callcut et al.

Multiple organ failure (MOF) is a life-threatening condition. Due to its urgency and high mortality rate, early detection is critical for clinicians to provide appropriate treatment. In this paper, we perform quantitative analysis on early MOF prediction with comprehensive machine learning (ML) configurations, including data preprocessing (missing value treatment, label balancing, feature scaling), feature selection, classifier choice, and hyperparameter tuning. Results show that classifier choice impacts both the performance improvement and variation most among all the configurations. In general, complex classifiers including ensemble methods can provide better performance than simple classifiers. However, blindly pursuing complex classifiers is unwise as it also brings the risk of greater performance variation.

16.0AIMar 19, 2021Code

BERTSurv: BERT-Based Survival Models for Predicting Outcomes of Trauma Patients

Yun Zhao, Qinghang Hong, Xinlu Zhang et al.

Survival analysis is a technique to predict the times of specific outcomes, and is widely used in predicting the outcomes for intensive care unit (ICU) trauma patients. Recently, deep learning models have drawn increasing attention in healthcare. However, there is a lack of deep learning methods that can model the relationship between measurements, clinical notes and mortality outcomes. In this paper we introduce BERTSurv, a deep learning survival framework which applies Bidirectional Encoder Representations from Transformers (BERT) as a language representation model on unstructured clinical notes, for mortality prediction and survival analysis. We also incorporate clinical measurements in BERTSurv. With binary cross-entropy (BCE) loss, BERTSurv can predict mortality as a binary outcome (mortality prediction). With partial log-likelihood (PLL) loss, BERTSurv predicts the probability of mortality as a time-to-event outcome (survival analysis). We apply BERTSurv on Medical Information Mart for Intensive Care III (MIMIC III) trauma patient data. For mortality prediction, BERTSurv obtained an area under the curve of receiver operating characteristic curve (AUC-ROC) of 0.86, which is an improvement of 3.6% over baseline of multilayer perceptron (MLP) without notes. For survival analysis, BERTSurv achieved a concordance index (C-index) of 0.7. In addition, visualizations of BERT's attention heads help to extract patterns in clinical notes and improve model interpretability by showing how the model assigns weights to different inputs.

5.0MLFeb 12, 2021

Robust and integrative Bayesian neural networks for likelihood-free parameter inference

Fredrik Wrede, Robin Eriksson, Richard Jiang et al.

State-of-the-art neural network-based methods for learning summary statistics have delivered promising results for simulation-based likelihood-free parameter inference. Existing approaches require density estimation as a post-processing step building upon deterministic neural networks, and do not take network prediction uncertainty into account. This work proposes a robust integrated approach that learns summary statistics using Bayesian neural networks, and directly estimates the posterior density using categorical distributions. An adaptive sampling scheme selects simulation locations to efficiently and iteratively refine the predictive posterior of the network conditioned on observations. This allows for more efficient and robust convergence on comparatively large prior spaces. We demonstrate our approach on benchmark examples and compare against related methods.

1.2IMDec 28, 2020

Model Optimization for Deep Space Exploration via Simulators and Deep Learning

James Bird, Kellan Colburn, Linda Petzold et al.

Machine learning, and eventually true artificial intelligence techniques, are extremely important advancements in astrophysics and astronomy. We explore the application of deep learning using neural networks in order to automate the detection of astronomical bodies for future exploration missions, such as missions to search for signatures or suitability of life. The ability to acquire images, analyze them, and send back those that are important, as determined by the deep learning algorithm, is critical in bandwidth-limited applications. Our previous foundational work solidified the concept of using simulator images and deep learning in order to detect planets. Optimization of this process is of vital importance, as even a small loss in accuracy might be the difference between capturing and completely missing a possibly-habitable nearby planet. Through computer vision, deep learning, and simulators, we introduce methods that optimize the detection of exoplanets. We show that maximum achieved accuracy can hit above 98% for multiple model architectures, even with a relatively small training set.

1.2SPSep 22, 2020

How Much Does It Hurt: A Deep Learning Framework for Chronic Pain Score Assessment

Yun Zhao, Franklin Ly, Qinghang Hong et al.

Chronic pain is defined as pain that lasts or recurs for more than 3 to 6 months, often long after the injury or illness that initially caused the pain has healed. The "gold standard" for chronic pain assessment remains self report and clinical assessment via a biopsychosocial interview, since there has been no device that can measure it. A device to measure pain would be useful not only for clinical assessment, but potentially also as a biofeedback device leading to pain reduction. In this paper we propose an end-to-end deep learning framework for chronic pain score assessment. Our deep learning framework splits the long time-course data samples into shorter sequences, and uses Consensus Prediction to classify the results. We evaluate the performance of our framework on two chronic pain score datasets collected from two iterations of prototype Pain Meters that we have developed to help chronic pain subjects better understand their health condition.

1.2IMFeb 10, 2020

Advances in Deep Space Exploration via Simulators & Deep Learning

James Bird, Linda Petzold, Philip Lubin et al.

The StarLight program conceptualizes fast interstellar travel via small wafer satellites (wafersats) that are propelled by directed energy. This process is wildly different from traditional space travel and trades large and slow spacecraft for small, fast, inexpensive, and fragile ones. The main goal of these wafer satellites is to gather useful images during their deep space journey. We introduce and solve some of the main problems that accompany this concept. First, we need an object detection system that can detect planets that we have never seen before, some containing features that we may not even know exist in the universe. Second, once we have images of exoplanets, we need a way to take these images and rank them by importance. Equipment fails and data rates are slow, thus we need a method to ensure that the most important images to humankind are the ones that are prioritized for data transfer. Finally, the energy on board is minimal and must be conserved and used sparingly. No exoplanet images should be missed, but using energy erroneously would be detrimental. We introduce simulator-based methods that leverage artificial intelligence, mostly in the form of computer vision, in order to solve all three of these issues. Our results confirm that simulators provide an extremely rich training environment that surpasses that of real images, and can be used to train models on features that have yet to be observed by humans. We also show that the immersive and adaptable environment provided by the simulator, combined with deep learning, lets us navigate and save energy in an otherwise implausible way.

4.3NCJun 5, 2019

A Deep Learning Framework for Classification of in vitro Multi-Electrode Array Recordings

Yun Zhao, Elmer Guzman, Morgane Audouard et al.

Multi-Electrode Arrays (MEAs) have been widely used to record neuronal activities, which could be used in the diagnosis of gene defects and drug effects. In this paper, we address the problem of classifying in vitro MEA recordings of mouse and human neuronal cultures from different genotypes, where there is no easy way to directly utilize raw sequences as inputs to train an end-to-end classification model. While carefully extracting some features by hand could partially solve the problem, this approach suffers from obvious drawbacks such as difficulty of generalizing. We propose a deep learning framework to address this challenge. Our approach correctly classifies neuronal culture data prepared from two different genotypes -- a mouse Knockout of the delta-catenin gene and human induced Pluripotent Stem Cell-derived neurons from Williams syndrome. By splitting the long recordings into short slices for training, and applying Consensus Prediction during testing, our deep learning approach improves the prediction accuracy by 16.69% compared with feature based Logistic Regression for mouse MEA recordings. We further achieve an accuracy of 95.91% using Consensus Prediction in one subset of mouse MEA recording data, which were all recorded at six days in vitro. As high-density MEA recordings become more widely available, this approach could be generalized for classification of neurons carrying different mutations and classification of drug responses.

4.1MLOct 25, 2017

Bayesian Inference over the Stiefel Manifold via the Givens Representation

Arya A Pourzanjani, Richard M Jiang, Brian Mitchell et al.

We introduce an approach based on the Givens representation for posterior inference in statistical models with orthogonal matrix parameters, such as factor models and probabilistic principal component analysis (PPCA). We show how the Givens representation can be used to develop practical methods for transforming densities over the Stiefel manifold into densities over subsets of Euclidean space. We show how to deal with issues arising from the topology of the Stiefel manifold and how to inexpensively compute the change-of-measure terms. We introduce an auxiliary parameter approach that limits the impact of topological issues. We provide both analysis of our methods and numerical examples demonstrating the effectiveness of the approach. We also discuss how our Givens representation can be used to define general classes of distributions over the space of orthogonal matrices. We then give demonstrations on several examples showing how the Givens approach performs in practice in comparison with other methods.

1.2NASep 5, 2017

Mesoscopic-microscopic spatial stochastic simulation with automatic system partitioning

Stefan Hellander, Andreas Hellander, Linda Petzold

The reaction-diffusion master equation (RDME) is a model that allows for efficient on-lattice simulation of spatially resolved stochastic chemical kinetics. Compared to off-lattice hard-sphere simulations with Brownian Dynamics (BD) or Green's Function Reaction Dynamics (GFRD) the RDME can be orders of magnitude faster if the lattice spacing can be chosen coarse enough. However, strongly diffusion-controlled reactions mandate a very fine mesh resolution for acceptable accuracy. It is common that reactions in the same model differ in their degree of diffusion control and therefore require different degrees of mesh resolution. This renders mesoscopic simulation inefficient for systems with multiscale properties. Mesoscopic-microscopic hybrid methods address this problem by resolving the most challenging reactions with a microscale, off-lattice simulation. However, all methods to date require manual partitioning of a system, effectively limiting their usefulness as 'black-box' simulation codes. In this paper we propose a hybrid simulation algorithm with automatic system partitioning based on indirect a priori error estimates. We demonstrate the accuracy and efficiency of the method on models of diffusion-controlled networks in 3D.