CLOct 24, 2023Code
A Survey on Detection of LLMs-Generated ContentXianjun Yang, Liangming Pan, Xuandong Zhao et al. · berkeley
The burgeoning capabilities of advanced large language models (LLMs) such as ChatGPT have led to an increase in synthetic content generation with implications across a variety of sectors, including media, cybersecurity, public discourse, and education. As such, the ability to detect LLMs-generated content has become of paramount importance. We aim to provide a detailed overview of existing detection strategies and benchmarks, scrutinizing their differences and identifying key challenges and prospects in the field, advocating for more adaptable and robust models to enhance detection accuracy. We also posit the necessity for a multi-faceted approach to defend against various attacks to counter the rapidly advancing capabilities of LLMs. To the best of our knowledge, this work is the first comprehensive survey on the detection in the era of LLMs. We hope it will provide a broad understanding of the current landscape of LLMs-generated content detection, offering a guiding reference for researchers and practitioners striving to uphold the integrity of digital information in an era increasingly dominated by synthetic content. The relevant papers are summarized and will be consistently updated at https://github.com/Xianjun-Yang/Awesome_papers_on_LLMs_detection.git.
CLMar 6, 2023Code
Dynamic Prompting: A Unified Framework for Prompt TuningXianjun Yang, Wei Cheng, Xujiang Zhao et al.
It has been demonstrated that the art of prompt tuning is highly effective in efficiently extracting knowledge from pretrained foundation models, encompassing pretrained language models (PLMs), vision pretrained models, and vision-language (V-L) models. However, the efficacy of employing fixed soft prompts with a predetermined position for concatenation with inputs for all instances, irrespective of their inherent disparities, remains uncertain. Variables such as the position, length, and representations of prompts across diverse instances and tasks can substantially influence the performance of prompt tuning. In this context, we provide a theoretical analysis, which reveals that optimizing the position of the prompt to encompass the input can capture additional semantic information that traditional prefix or postfix prompt tuning methods fail to capture. Building upon our analysis, we present a unified dynamic prompt (DP) tuning strategy that dynamically determines different factors of prompts based on specific tasks and instances. To accomplish this, we employ a lightweight learning network with Gumble-Softmax, allowing us to learn instance-dependent guidance. Experimental results underscore the significant performance improvement achieved by dynamic prompt tuning across a wide range of tasks, including NLP tasks, vision recognition tasks, and vision-language tasks. Furthermore, we establish the universal applicability of our approach under full-data, few-shot, and multitask scenarios. Codes are available at https://github.com/Xianjun-Yang/DPT.
CLOct 4, 2023Code
Shadow Alignment: The Ease of Subverting Safely-Aligned Language ModelsXianjun Yang, Xiao Wang, Qi Zhang et al.
Warning: This paper contains examples of harmful language, and reader discretion is recommended. The increasing open release of powerful large language models (LLMs) has facilitated the development of downstream applications by reducing the essential cost of data annotation and computation. To ensure AI safety, extensive safety-alignment measures have been conducted to armor these models against malicious use (primarily hard prompt attack). However, beneath the seemingly resilient facade of the armor, there might lurk a shadow. By simply tuning on 100 malicious examples with 1 GPU hour, these safely aligned LLMs can be easily subverted to generate harmful content. Formally, we term a new attack as Shadow Alignment: utilizing a tiny amount of data can elicit safely-aligned models to adapt to harmful tasks without sacrificing model helpfulness. Remarkably, the subverted models retain their capability to respond appropriately to regular inquiries. Experiments across 8 models released by 5 different organizations (LLaMa-2, Falcon, InternLM, BaiChuan2, Vicuna) demonstrate the effectiveness of shadow alignment attack. Besides, the single-turn English-only attack successfully transfers to multi-turn dialogue and other languages. This study serves as a clarion call for a collective effort to overhaul and fortify the safety of open-source LLMs against malicious attackers.
CLDec 19, 2022
OASum: Large-Scale Open Domain Aspect-based SummarizationXianjun Yang, Kaiqiang Song, Sangwoo Cho et al. · tencent-ai
Aspect or query-based summarization has recently caught more attention, as it can generate differentiated summaries based on users' interests. However, the current dataset for aspect or query-based summarization either focuses on specific domains, contains relatively small-scale instances, or includes only a few aspect types. Such limitations hinder further explorations in this direction. In this work, we take advantage of crowd-sourcing knowledge on Wikipedia.org and automatically create a high-quality, large-scale open-domain aspect-based summarization dataset named OASum, which contains more than 3.7 million instances with around 1 million different aspects on 2 million Wikipedia pages. We provide benchmark results on OASum and demonstrate its ability for diverse aspect-based summarization generation. To overcome the data scarcity problem on specific domains, we also perform zero-shot, few-shot, and fine-tuning on seven downstream datasets. Specifically, zero/few-shot and fine-tuning results show that the model pre-trained on our corpus demonstrates a strong aspect or query-focused generation ability compared with the backbone model. Our dataset and pre-trained checkpoints are publicly available.
LGMar 28, 2022
Integrating Physiological Time Series and Clinical Notes with Transformer for Early Prediction of SepsisYuqing Wang, Yun Zhao, Rachael Callcut et al. · stanford
Sepsis is a leading cause of death in the Intensive Care Units (ICU). Early detection of sepsis is critical for patient survival. In this paper, we propose a multimodal Transformer model for early sepsis prediction, using the physiological time series data and clinical notes for each patient within $36$ hours of ICU admission. Specifically, we aim to predict sepsis using only the first 12, 18, 24, 30 and 36 hours of laboratory measurements, vital signs, patient demographics, and clinical notes. We evaluate our model on two large critical care datasets: MIMIC-III and eICU-CRD. The proposed method is compared with six baselines. In addition, ablation analysis and case studies are conducted to study the influence of each individual component of the model and the contribution of each data modality for early sepsis prediction. Experimental results demonstrate the effectiveness of our method, which outperforms competitive baselines on all metrics.
CLApr 9, 2023
Are Large Language Models Ready for Healthcare? A Comparative Study on Clinical Language UnderstandingYuqing Wang, Yun Zhao, Linda Petzold · stanford
Large language models (LLMs) have made significant progress in various domains, including healthcare. However, the specialized nature of clinical language understanding tasks presents unique challenges and limitations that warrant further investigation. In this study, we conduct a comprehensive evaluation of state-of-the-art LLMs, namely GPT-3.5, GPT-4, and Bard, within the realm of clinical language understanding tasks. These tasks span a diverse range, including named entity recognition, relation extraction, natural language inference, semantic textual similarity, document classification, and question-answering. We also introduce a novel prompting strategy, self-questioning prompting (SQP), tailored to enhance LLMs' performance by eliciting informative questions and answers pertinent to the clinical scenarios at hand. Our evaluation underscores the significance of task-specific learning strategies and prompting techniques for improving LLMs' effectiveness in healthcare-related tasks. Additionally, our in-depth error analysis on the challenging relation extraction task offers valuable insights into error distribution and potential avenues for improvement using SQP. Our study sheds light on the practical implications of employing LLMs in the specialized domain of healthcare, serving as a foundation for future research and the development of potential applications in healthcare settings.
NADec 8, 2011
On the Reaction Diffusion Master Equation in the Microscopic LimitStefan Hellander, Andreas Hellander, Linda Petzold
Stochastic modeling of reaction-diffusion kinetics has emerged as a powerful theoretical tool in the study of biochemical reaction networks. Two frequently employed models are the particle-tracking Smoluchowski framework and the on-lattice Reaction-Diffusion Master Equation (RDME) framework. As the mesh size goes from coarse to fine, the RDME initially becomes more accurate. However, recent developments have shown that it will become increasingly inaccurate compared to the Smoluchowski model as the lattice spacing becomes very fine. In this paper we give a new, general and simple argument for why the RDME breaks down. Our analysis reveals a hard limit on the voxel size for which no local RDME can agree with the Smoluchowski model.
LGOct 18, 2022
Improving Medical Predictions by Irregular Multimodal Electronic Health Records ModelingXinlu Zhang, Shiyang Li, Zhiyu Chen et al.
Health conditions among patients in intensive care units (ICUs) are monitored via electronic health records (EHRs), composed of numerical time series and lengthy clinical note sequences, both taken at irregular time intervals. Dealing with such irregularity in every modality, and integrating irregularity into multimodal representations to improve medical predictions, is a challenging problem. Our method first addresses irregularity in each single modality by (1) modeling irregular time series by dynamically incorporating hand-crafted imputation embeddings into learned interpolation embeddings via a gating mechanism, and (2) casting a series of clinical note representations as multivariate irregular time series and tackling irregularity via a time attention mechanism. We further integrate irregularity in multimodal fusion with an interleaved attention mechanism across temporal steps. To the best of our knowledge, this is the first work to thoroughly model irregularity in multimodalities for improving medical predictions. Our proposed methods for two medical prediction tasks consistently outperforms state-of-the-art (SOTA) baselines in each single modality and multimodal fusion scenarios. Specifically, we observe relative improvements of 6.5\%, 3.6\%, and 4.3\% in F1 for time series, clinical notes, and multimodal fusion, respectively. These results demonstrate the effectiveness of our methods and the importance of considering irregularity in multimodal EHRs.
NAJan 28, 2015
Reaction rates for mesoscopic reaction-diffusion kineticsStefan Hellander, Andreas Hellander, Linda Petzold
The mesoscopic reaction-diffusion master equation (RDME) is a popular modeling framework, frequently applied to stochastic reaction-diffusion kinetics in systems biology. The RDME is derived from assumptions about the underlying physical properties of the system, and it may produce unphysical results for models where those assumptions fail. In that case, other more comprehensive models are better suited, such as hard-sphere Brownian dynamics (BD). Although the RDME is a model in its own right, and not inferred from any specific microscale model, it proves useful to attempt to approximate a microscale model by a specific choice of mesoscopic reaction rates. In this paper we derive mesoscopic reaction rates by matching certain statistics of the RDME solution to statistics of the solution of a widely used microscopic BD model: the Smoluchowski model with a mixed boundary condition at the reaction radius of two molecules. We also establish fundamental limits for the range of mesh resolutions for which this approach yields accurate results, and show both theoretically and in numerical examples that as we approach the lower fundamental limit, the mesoscopic dynamics approach the microscopic dynamics.
LGSep 28, 2022Code
VREN: Volleyball Rally Dataset with Expression Notation LanguageHaotian Xia, Rhys Tracy, Yun Zhao et al.
This research is intended to accomplish two goals: The first goal is to curate a large and information rich dataset that contains crucial and succinct summaries on the players' actions and positions and the back-and-forth travel patterns of the volleyball in professional and NCAA Div-I indoor volleyball games. While several prior studies have aimed to create similar datasets for other sports (e.g. badminton and soccer), creating such a dataset for indoor volleyball is not yet realized. The second goal is to introduce a volleyball descriptive language to fully describe the rally processes in the games and apply the language to our dataset. Based on the curated dataset and our descriptive sports language, we introduce three tasks for automated volleyball action and tactic analysis using our dataset: (1) Volleyball Rally Prediction, aimed at predicting the outcome of a rally and helping players and coaches improve decision-making in practice, (2) Setting Type and Hitting Type Prediction, to help coaches and players prepare more effectively for the game, and (3) Volleyball Tactics and Attacking Zone Statistics, to provide advanced volleyball statistics and help coaches understand the game and opponent's tactics better. We conducted case studies to show how experimental results can provide insights to the volleyball analysis community. Furthermore, experimental evaluation based on real-world data establishes a baseline for future studies and applications of our dataset and language. This study bridges the gap between the indoor volleyball field and computer science. The dataset is available at: https://github.com/haotianxia/VREN.
CLFeb 11, 2023Code
MatKB: Semantic Search for Polycrystalline Materials Synthesis ProceduresXianjun Yang, Stephen Wilson, Linda Petzold
In this paper, we present a novel approach to knowledge extraction and retrieval using Natural Language Processing (NLP) techniques for material science. Our goal is to automatically mine structured knowledge from millions of research articles in the field of polycrystalline materials and make it easily accessible to the broader community. The proposed method leverages NLP techniques such as entity recognition and document classification to extract relevant information and build an extensive knowledge base, from a collection of 9.5 Million publications. The resulting knowledge base is integrated into a search engine, which enables users to search for information about specific materials, properties, and experiments with greater precision than traditional search engines like Google. We hope our results can enable material scientists quickly locate desired experimental procedures, compare their differences, and even inspire them to design new experiments. Our website will be available at Github \footnote{https://github.com/Xianjun-Yang/PcMSP.git} soon.
LGJun 26, 2022
Predicting the Need for Blood Transfusion in Intensive Care Units with Reinforcement LearningYuqing Wang, Yun Zhao, Linda Petzold · stanford
As critically ill patients frequently develop anemia or coagulopathy, transfusion of blood products is a frequent intervention in the Intensive Care Units (ICU). However, inappropriate transfusion decisions made by physicians are often associated with increased risk of complications and higher hospital costs. In this work, we aim to develop a decision support tool that uses available patient information for transfusion decision-making on three common blood products (red blood cells, platelets, and fresh frozen plasma). To this end, we adopt an off-policy batch reinforcement learning (RL) algorithm, namely, discretized Batch Constrained Q-learning, to determine the best action (transfusion or not) given observed patient trajectories. Simultaneously, we consider different state representation approaches and reward design mechanisms to evaluate their impacts on policy learning. Experiments are conducted on two real-world critical care datasets: the MIMIC-III and the UCSF. Results demonstrate that policy recommendations on transfusion achieved comparable matching against true hospital policies via accuracy and weighted importance sampling evaluations on the MIMIC-III dataset. Furthermore, a combination of transfer learning (TL) and RL on the data-scarce UCSF dataset can provide up to $17.02% improvement in terms of accuracy, and up to 18.94% and 21.63% improvement in jump-start and asymptotic performance in terms of weighted importance sampling averaged over three transfusion tasks. Finally, simulations on transfusion decisions suggest that the transferred RL policy could reduce patients' estimated 28-day mortality rate by 2.74% and decreased acuity rate by 1.18% on the UCSF dataset.
LGMar 28, 2022
Enhancing Transformer Efficiency for Multivariate Time Series ClassificationYuqing Wang, Yun Zhao, Linda Petzold · stanford
Most current multivariate time series (MTS) classification algorithms focus on improving the predictive accuracy. However, for large-scale (either high-dimensional or long-sequential) time series (TS) datasets, there is an additional consideration: to design an efficient network architecture to reduce computational costs such as training time and memory footprint. In this work we propose a methodology based on module-wise pruning and Pareto analysis to investigate the relationship between model efficiency and accuracy, as well as its complexity. Comprehensive experiments on benchmark MTS datasets illustrate the effectiveness of our method.
88.6CEMay 23
Flux-Preserving Adaptive Finite State Projection for Multiscale Stochastic Reaction NetworksAditya Dendukuri, Shivkumar Chandrasekaran, Linda Petzold
The Finite State Projection (FSP) method approximates the Chemical Master Equation (CME) by restricting the dynamics to a finite subset of the (typically infinite) state space, enabling direct numerical solution with computable error bounds. Adaptive variants update this subset in time, but multiscale systems with widely separated reaction rates remain challenging, as low-probability bottleneck states can carry essential probability flux and the dynamics alternate between fast transients and slowly evolving stiff regimes. We propose a flux-based adaptive FSP method that uses probability flux to drive both state-space pruning and time-step selection. The pruning rule protects low-probability states with large outgoing flux, preserving connectivity in bottleneck systems, while the time-step rule adapts to the instantaneous total flux to handle rate constants spanning several orders of magnitude. Numerical experiments on stiff, oscillatory, and bottleneck reaction networks show that the method maintains accuracy while using substantially smaller state spaces.
CLOct 8, 2023
Zero-Shot Detection of Machine-Generated CodesXianjun Yang, Kexun Zhang, Haifeng Chen et al.
This work proposes a training-free approach for the detection of LLMs-generated codes, mitigating the risks associated with their indiscriminate usage. To the best of our knowledge, our research is the first to investigate zero-shot detection techniques applied to code generated by advanced black-box LLMs like ChatGPT. Firstly, we find that existing training-based or zero-shot text detectors are ineffective in detecting code, likely due to the unique statistical properties found in code structures. We then modify the previous zero-shot text detection method, DetectGPT (Mitchell et al., 2023) by utilizing a surrogate white-box model to estimate the probability of the rightmost tokens, allowing us to identify code snippets generated by language models. Through extensive experiments conducted on the python codes of the CodeContest and APPS dataset, our approach demonstrates its effectiveness by achieving state-of-the-art detection results on text-davinci-003, GPT-3.5, and GPT-4 models. Moreover, our method exhibits robustness against revision attacks and generalizes well to Java codes. We also find that the smaller code language model like PolyCoder-160M performs as a universal code detector, outperforming the billion-scale counterpart. The codes will be available at https://github.com/ Xianjun-Yang/Code_detection.git
CLSep 6, 2022
Few-Shot Document-Level Event Argument ExtractionXianjun Yang, Yujie Lu, Linda Petzold
Event argument extraction (EAE) has been well studied at the sentence level but under-explored at the document level. In this paper, we study to capture event arguments that actually spread across sentences in documents. Prior works usually assume full access to rich document supervision, ignoring the fact that the available argument annotation is usually limited. To fill this gap, we present FewDocAE, a Few-Shot Document-Level Event Argument Extraction benchmark, based on the existing document-level event extraction dataset. We first define the new problem and reconstruct the corpus by a novel N -Way-D-Doc sampling instead of the traditional N -Way-K-Shot strategy. Then we adjust the current document-level neural models into the few-shot setting to provide baseline results under in- and cross-domain settings. Since the argument extraction depends on the context from multiple sentences and the learning process is limited to very few examples, we find this novel task to be very challenging with substantively low performance. Considering FewDocAE is closely related to practical use under low-resource regimes, we hope this benchmark encourages more research in this direction. Our data and codes will be available online.
LGAug 9, 2022
Interpretable Polynomial Neural Ordinary Differential EquationsColby Fronk, Linda Petzold
Neural networks have the ability to serve as universal function approximators, but they are not interpretable and don't generalize well outside of their training region. Both of these issues are problematic when trying to apply standard neural ordinary differential equations (neural ODEs) to dynamical systems. We introduce the polynomial neural ODE, which is a deep polynomial neural network inside of the neural ODE framework. We demonstrate the capability of polynomial neural ODEs to predict outside of the training region, as well as perform direct symbolic regression without additional tools such as SINDy.
CLOct 22, 2022
PcMSP: A Dataset for Scientific Action Graphs Extraction from Polycrystalline Materials Synthesis Procedure TextXianjun Yang, Ya Zhuo, Julia Zuo et al.
Scientific action graphs extraction from materials synthesis procedures is important for reproducible research, machine automation, and material prediction. But the lack of annotated data has hindered progress in this field. We demonstrate an effort to annotate Polycrystalline Materials Synthesis Procedures (PcMSP) from 305 open access scientific articles for the construction of synthesis action graphs. This is a new dataset for material science information extraction that simultaneously contains the synthesis sentences extracted from the experimental paragraphs, as well as the entity mentions and intra-sentence relations. A two-step human annotation and inter-annotator agreement study guarantee the high quality of the PcMSP corpus. We introduce four natural language processing tasks: sentence classification, named entity recognition, relation classification, and joint extraction of entities and relations. Comprehensive experiments validate the effectiveness of several state-of-the-art models for these challenges while leaving large space for improvement. We also perform the error analysis and point out some unique challenges that require further investigation. We will release our annotation scheme, the corpus, and codes to the research community to alleviate the scarcity of labeled data in this domain.
CLMay 2, 2024Code
A Survey on Large Language Models for Critical Societal Domains: Finance, Healthcare, and LawZhiyu Zoey Chen, Jing Ma, Xinlu Zhang et al.
In the fast-evolving domain of artificial intelligence, large language models (LLMs) such as GPT-3 and GPT-4 are revolutionizing the landscapes of finance, healthcare, and law: domains characterized by their reliance on professional expertise, challenging data acquisition, high-stakes, and stringent regulatory compliance. This survey offers a detailed exploration of the methodologies, applications, challenges, and forward-looking opportunities of LLMs within these high-stakes sectors. We highlight the instrumental role of LLMs in enhancing diagnostic and treatment methodologies in healthcare, innovating financial analytics, and refining legal interpretation and compliance strategies. Moreover, we critically examine the ethics for LLM applications in these fields, pointing out the existing ethical concerns and the need for transparent, fair, and robust AI systems that respect regulatory norms. By presenting a thorough review of current literature and practical applications, we showcase the transformative impact of LLMs, and outline the imperative for interdisciplinary cooperation, methodological advancements, and ethical vigilance. Through this lens, we aim to spark dialogue and inspire future research dedicated to maximizing the benefits of LLMs while mitigating their risks in these precision-dependent sectors. To facilitate future research on LLMs in these critical societal domains, we also initiate a reading list that tracks the latest advancements under this topic, which will be continually updated: \url{https://github.com/czyssrs/LLM_X_papers}.
LGAug 17, 2023
Bayesian polynomial neural networks and polynomial neural ordinary differential equationsColby Fronk, Jaewoong Yun, Prashant Singh et al.
Symbolic regression with polynomial neural networks and polynomial neural ordinary differential equations (ODEs) are two recent and powerful approaches for equation recovery of many science and engineering problems. However, these methods provide point estimates for the model parameters and are currently unable to accommodate noisy data. We address this challenge by developing and validating the following Bayesian inference methods: the Laplace approximation, Markov Chain Monte Carlo (MCMC) sampling methods, and variational inference. We have found the Laplace approximation to be the best method for this class of problems. Our work can be easily extended to the broader class of symbolic neural networks to which the polynomial neural network belongs.
CLJan 2, 2024Code
Quokka: An Open-source Large Language Model ChatBot for Material ScienceXianjun Yang, Stephen D. Wilson, Linda Petzold
This paper presents the development of a specialized chatbot for materials science, leveraging the Llama-2 language model, and continuing pre-training on the expansive research articles in the materials science domain from the S2ORC dataset. The methodology involves an initial pretraining phase on over one million domain-specific papers, followed by an instruction-tuning process to refine the chatbot's capabilities. The chatbot is designed to assist researchers, educators, and students by providing instant, context-aware responses to queries in the field of materials science. We make the four trained checkpoints (7B, 13B, with or without chat ability) freely available to the research community at https://github.com/Xianjun-Yang/Quokka.
CLMay 27, 2023Code
DNA-GPT: Divergent N-Gram Analysis for Training-Free Detection of GPT-Generated TextXianjun Yang, Wei Cheng, Yue Wu et al.
Large language models (LLMs) have notably enhanced the fluency and diversity of machine-generated text. However, this progress also presents a significant challenge in detecting the origin of a given text, and current research on detection methods lags behind the rapid evolution of LLMs. Conventional training-based methods have limitations in flexibility, particularly when adapting to new domains, and they often lack explanatory power. To address this gap, we propose a novel training-free detection strategy called Divergent N-Gram Analysis (DNA-GPT). Given a text, we first truncate it in the middle and then use only the preceding portion as input to the LLMs to regenerate the new remaining parts. By analyzing the differences between the original and new remaining parts through N-gram analysis in black-box or probability divergence in white-box, we unveil significant discrepancies between the distribution of machine-generated text and the distribution of human-written text. We conducted extensive experiments on the most advanced LLMs from OpenAI, including text-davinci-003, GPT-3.5-turbo, and GPT-4, as well as open-source models such as GPT-NeoX-20B and LLaMa-13B. Results show that our zero-shot approach exhibits state-of-the-art performance in distinguishing between human and GPT-generated text on four English and one German dataset, outperforming OpenAI's own classifier, which is trained on millions of text. Additionally, our methods provide reasonable explanations and evidence to support our claim, which is a unique feature of explainable detection. Our method is also robust under the revised text attack and can additionally solve model sourcing. Codes are available at https://github.com/Xianjun-Yang/DNA-GPT.
NADec 2, 2024
Training Stiff Neural Ordinary Differential Equations with Explicit Exponential Integration MethodsColby Fronk, Linda Petzold
Stiff ordinary differential equations (ODEs) are common in many science and engineering fields, but standard neural ODE approaches struggle to accurately learn these stiff systems, posing a significant barrier to widespread adoption of neural ODEs. In our earlier work, we addressed this challenge by utilizing single-step implicit methods for solving stiff neural ODEs. While effective, these implicit methods are computationally costly and can be complex to implement. This paper expands on our earlier work by exploring explicit exponential integration methods as a more efficient alternative. We evaluate the potential of these explicit methods to handle stiff dynamics in neural ODEs, aiming to enhance their applicability to a broader range of scientific and engineering problems. We found the integrating factor Euler (IF Euler) method to excel in stability and efficiency. While implicit schemes failed to train the stiff Van der Pol oscillator, the IF Euler method succeeded, even with large step sizes. However, IF Euler's first-order accuracy limits its use, leaving the development of higher-order methods for stiff neural ODEs an open research problem.
LGAug 2, 2025
The Vanishing Gradient Problem for Stiff Neural Differential EquationsColby Fronk, Linda Petzold
Gradient-based optimization of neural differential equations and other parameterized dynamical systems fundamentally relies on the ability to differentiate numerical solutions with respect to model parameters. In stiff systems, it has been observed that sensitivities to parameters controlling fast-decaying modes become vanishingly small during training, leading to optimization difficulties. In this paper, we show that this vanishing gradient phenomenon is not an artifact of any particular method, but a universal feature of all A-stable and L-stable stiff numerical integration schemes. We analyze the rational stability function for general stiff integration schemes and demonstrate that the relevant parameter sensitivities, governed by the derivative of the stability function, decay to zero for large stiffness. Explicit formulas for common stiff integration schemes are provided, which illustrate the mechanism in detail. Finally, we rigorously prove that the slowest possible rate of decay for the derivative of the stability function is $O(|z|^{-1})$, revealing a fundamental limitation: all A-stable time-stepping methods inevitably suppress parameter gradients in stiff regimes, posing a significant barrier for training and parameter identification in stiff neural ODEs.
CVMay 10, 2023
An Empirical Study on the Robustness of the Segment Anything Model (SAM)Yuqing Wang, Yun Zhao, Linda Petzold
The Segment Anything Model (SAM) is a foundation model for general image segmentation. Although it exhibits impressive performance predominantly on natural images, understanding its robustness against various image perturbations and domains is critical for real-world applications where such challenges frequently arise. In this study we conduct a comprehensive robustness investigation of SAM under diverse real-world conditions. Our experiments encompass a wide range of image perturbations. Our experimental results demonstrate that SAM's performance generally declines under perturbed images, with varying degrees of vulnerability across different perturbations. By customizing prompting techniques and leveraging domain knowledge based on the unique characteristics of each dataset, the model's resilience to these perturbations can be enhanced, addressing dataset-specific challenges. This work sheds light on the limitations and strengths of SAM in real-world applications, promoting the development of more robust and versatile image segmentation solutions.
LGOct 1, 2021
Empirical Quantitative Analysis of COVID-19 Forecasting ModelsYun Zhao, Yuqing Wang, Junfeng Liu et al.
COVID-19 has been a public health emergency of international concern since early 2020. Reliable forecasting is critical to diminish the impact of this disease. To date, a large number of different forecasting models have been proposed, mainly including statistical models, compartmental models, and deep learning models. However, due to various uncertain factors across different regions such as economics and government policy, no forecasting model appears to be the best for all scenarios. In this paper, we perform quantitative analysis of COVID-19 forecasting of confirmed cases and deaths across different regions in the United States with different forecasting horizons, and evaluate the relative impacts of the following three dimensions on the predictive performance (improvement and variation) through different evaluation metrics: model selection, hyperparameter tuning, and the length of time series required for training. We find that if a dimension brings about higher performance gains, if not well-tuned, it may also lead to harsher performance penalties. Furthermore, model selection is the dominant factor in determining the predictive performance. It is responsible for both the largest improvement and the largest variation in performance in all prediction tasks across different regions. While practitioners may perform more complicated time series analysis in practice, they should be able to achieve reasonable results if they have adequate insight into key decisions like model selection.
LGJun 22, 2021
Multiple Organ Failure Prediction with Classifier-Guided Generative Adversarial Imputation NetworksXinlu Zhang, Yun Zhao, Rachael Callcut et al.
Multiple organ failure (MOF) is a severe syndrome with a high mortality rate among Intensive Care Unit (ICU) patients. Early and precise detection is critical for clinicians to make timely decisions. An essential challenge in applying machine learning models to electronic health records (EHRs) is the pervasiveness of missing values. Most existing imputation methods are involved in the data preprocessing phase, failing to capture the relationship between data and outcome for downstream predictions. In this paper, we propose classifier-guided generative adversarial imputation networks Classifier-GAIN) for MOF prediction to bridge this gap, by incorporating both observed data and label information. Specifically, the classifier takes imputed values from the generator(imputer) to predict task outcomes and provides additional supervision signals to the generator by joint training. The classifier-guide generator imputes missing values with label-awareness during training, improving the classifier's performance during inference. We conduct extensive experiments showing that our approach consistently outperforms classical and state-of-art neural baselines across a range of missing data scenarios and evaluation metrics.
LGMar 19, 2021
Empirical Analysis of Machine Learning Configurations for Prediction of Multiple Organ Failure in Trauma PatientsYuqing Wang, Yun Zhao, Rachael Callcut et al.
Multiple organ failure (MOF) is a life-threatening condition. Due to its urgency and high mortality rate, early detection is critical for clinicians to provide appropriate treatment. In this paper, we perform quantitative analysis on early MOF prediction with comprehensive machine learning (ML) configurations, including data preprocessing (missing value treatment, label balancing, feature scaling), feature selection, classifier choice, and hyperparameter tuning. Results show that classifier choice impacts both the performance improvement and variation most among all the configurations. In general, complex classifiers including ensemble methods can provide better performance than simple classifiers. However, blindly pursuing complex classifiers is unwise as it also brings the risk of greater performance variation.
AIMar 19, 2021
BERTSurv: BERT-Based Survival Models for Predicting Outcomes of Trauma PatientsYun Zhao, Qinghang Hong, Xinlu Zhang et al.
Survival analysis is a technique to predict the times of specific outcomes, and is widely used in predicting the outcomes for intensive care unit (ICU) trauma patients. Recently, deep learning models have drawn increasing attention in healthcare. However, there is a lack of deep learning methods that can model the relationship between measurements, clinical notes and mortality outcomes. In this paper we introduce BERTSurv, a deep learning survival framework which applies Bidirectional Encoder Representations from Transformers (BERT) as a language representation model on unstructured clinical notes, for mortality prediction and survival analysis. We also incorporate clinical measurements in BERTSurv. With binary cross-entropy (BCE) loss, BERTSurv can predict mortality as a binary outcome (mortality prediction). With partial log-likelihood (PLL) loss, BERTSurv predicts the probability of mortality as a time-to-event outcome (survival analysis). We apply BERTSurv on Medical Information Mart for Intensive Care III (MIMIC III) trauma patient data. For mortality prediction, BERTSurv obtained an area under the curve of receiver operating characteristic curve (AUC-ROC) of 0.86, which is an improvement of 3.6% over baseline of multilayer perceptron (MLP) without notes. For survival analysis, BERTSurv achieved a concordance index (C-index) of 0.7. In addition, visualizations of BERT's attention heads help to extract patterns in clinical notes and improve model interpretability by showing how the model assigns weights to different inputs.
MLFeb 12, 2021
Robust and integrative Bayesian neural networks for likelihood-free parameter inferenceFredrik Wrede, Robin Eriksson, Richard Jiang et al.
State-of-the-art neural network-based methods for learning summary statistics have delivered promising results for simulation-based likelihood-free parameter inference. Existing approaches require density estimation as a post-processing step building upon deterministic neural networks, and do not take network prediction uncertainty into account. This work proposes a robust integrated approach that learns summary statistics using Bayesian neural networks, and directly estimates the posterior density using categorical distributions. An adaptive sampling scheme selects simulation locations to efficiently and iteratively refine the predictive posterior of the network conditioned on observations. This allows for more efficient and robust convergence on comparatively large prior spaces. We demonstrate our approach on benchmark examples and compare against related methods.
IMDec 28, 2020
Model Optimization for Deep Space Exploration via Simulators and Deep LearningJames Bird, Kellan Colburn, Linda Petzold et al.
Machine learning, and eventually true artificial intelligence techniques, are extremely important advancements in astrophysics and astronomy. We explore the application of deep learning using neural networks in order to automate the detection of astronomical bodies for future exploration missions, such as missions to search for signatures or suitability of life. The ability to acquire images, analyze them, and send back those that are important, as determined by the deep learning algorithm, is critical in bandwidth-limited applications. Our previous foundational work solidified the concept of using simulator images and deep learning in order to detect planets. Optimization of this process is of vital importance, as even a small loss in accuracy might be the difference between capturing and completely missing a possibly-habitable nearby planet. Through computer vision, deep learning, and simulators, we introduce methods that optimize the detection of exoplanets. We show that maximum achieved accuracy can hit above 98% for multiple model architectures, even with a relatively small training set.
SPSep 22, 2020
How Much Does It Hurt: A Deep Learning Framework for Chronic Pain Score AssessmentYun Zhao, Franklin Ly, Qinghang Hong et al.
Chronic pain is defined as pain that lasts or recurs for more than 3 to 6 months, often long after the injury or illness that initially caused the pain has healed. The "gold standard" for chronic pain assessment remains self report and clinical assessment via a biopsychosocial interview, since there has been no device that can measure it. A device to measure pain would be useful not only for clinical assessment, but potentially also as a biofeedback device leading to pain reduction. In this paper we propose an end-to-end deep learning framework for chronic pain score assessment. Our deep learning framework splits the long time-course data samples into shorter sequences, and uses Consensus Prediction to classify the results. We evaluate the performance of our framework on two chronic pain score datasets collected from two iterations of prototype Pain Meters that we have developed to help chronic pain subjects better understand their health condition.
IMFeb 10, 2020
Advances in Deep Space Exploration via Simulators & Deep LearningJames Bird, Linda Petzold, Philip Lubin et al.
The StarLight program conceptualizes fast interstellar travel via small wafer satellites (wafersats) that are propelled by directed energy. This process is wildly different from traditional space travel and trades large and slow spacecraft for small, fast, inexpensive, and fragile ones. The main goal of these wafer satellites is to gather useful images during their deep space journey. We introduce and solve some of the main problems that accompany this concept. First, we need an object detection system that can detect planets that we have never seen before, some containing features that we may not even know exist in the universe. Second, once we have images of exoplanets, we need a way to take these images and rank them by importance. Equipment fails and data rates are slow, thus we need a method to ensure that the most important images to humankind are the ones that are prioritized for data transfer. Finally, the energy on board is minimal and must be conserved and used sparingly. No exoplanet images should be missed, but using energy erroneously would be detrimental. We introduce simulator-based methods that leverage artificial intelligence, mostly in the form of computer vision, in order to solve all three of these issues. Our results confirm that simulators provide an extremely rich training environment that surpasses that of real images, and can be used to train models on features that have yet to be observed by humans. We also show that the immersive and adaptable environment provided by the simulator, combined with deep learning, lets us navigate and save energy in an otherwise implausible way.
NCJun 5, 2019
A Deep Learning Framework for Classification of in vitro Multi-Electrode Array RecordingsYun Zhao, Elmer Guzman, Morgane Audouard et al.
Multi-Electrode Arrays (MEAs) have been widely used to record neuronal activities, which could be used in the diagnosis of gene defects and drug effects. In this paper, we address the problem of classifying in vitro MEA recordings of mouse and human neuronal cultures from different genotypes, where there is no easy way to directly utilize raw sequences as inputs to train an end-to-end classification model. While carefully extracting some features by hand could partially solve the problem, this approach suffers from obvious drawbacks such as difficulty of generalizing. We propose a deep learning framework to address this challenge. Our approach correctly classifies neuronal culture data prepared from two different genotypes -- a mouse Knockout of the delta-catenin gene and human induced Pluripotent Stem Cell-derived neurons from Williams syndrome. By splitting the long recordings into short slices for training, and applying Consensus Prediction during testing, our deep learning approach improves the prediction accuracy by 16.69% compared with feature based Logistic Regression for mouse MEA recordings. We further achieve an accuracy of 95.91% using Consensus Prediction in one subset of mouse MEA recording data, which were all recorded at six days in vitro. As high-density MEA recordings become more widely available, this approach could be generalized for classification of neurons carrying different mutations and classification of drug responses.
NASep 5, 2017
Mesoscopic-microscopic spatial stochastic simulation with automatic system partitioningStefan Hellander, Andreas Hellander, Linda Petzold
The reaction-diffusion master equation (RDME) is a model that allows for efficient on-lattice simulation of spatially resolved stochastic chemical kinetics. Compared to off-lattice hard-sphere simulations with Brownian Dynamics (BD) or Green's Function Reaction Dynamics (GFRD) the RDME can be orders of magnitude faster if the lattice spacing can be chosen coarse enough. However, strongly diffusion-controlled reactions mandate a very fine mesh resolution for acceptable accuracy. It is common that reactions in the same model differ in their degree of diffusion control and therefore require different degrees of mesh resolution. This renders mesoscopic simulation inefficient for systems with multiscale properties. Mesoscopic-microscopic hybrid methods address this problem by resolving the most challenging reactions with a microscale, off-lattice simulation. However, all methods to date require manual partitioning of a system, effectively limiting their usefulness as 'black-box' simulation codes. In this paper we propose a hybrid simulation algorithm with automatic system partitioning based on indirect a priori error estimates. We demonstrate the accuracy and efficiency of the method on models of diffusion-controlled networks in 3D.
NASep 25, 2016
Reaction rates for reaction-diffusion kinetics on unstructured meshesStefan Hellander, Linda Petzold
The reaction-diffusion master equation is a stochastic model often utilized in the study of biochemical reaction networks in living cells. It is applied when the spatial distribution of molecules is important to the dynamics of the system. A viable approach to resolve the complex geometry of cells accurately is to discretize space with an unstructured mesh. Diffusion is modeled as discrete jumps between nodes on the mesh, and the diffusion jump rates can be obtained through a discretization of the diffusion equation on the mesh. Reactions can occur when molecules occupy the same voxel. In this paper, we develop a method for computing accurate reaction rates between molecules occupying the same voxel in an unstructured mesh. For large voxels, these rates are known to be well approximated by the reaction rates derived by Collins and Kimball, but as the mesh is refined, no analytical expression for the rates exists. We reduce the problem of computing accurate reaction rates to a pure preprocessing step, depending only on the mesh and not on the model parameters, and we devise an efficient numerical scheme to estimate them to high accuracy. We show in several numerical examples that as we refine the mesh, the results obtained with the reaction-diffusion master equation approach those of a more fine-grained Smoluchowski particle-tracking model.
NAMar 25, 2015
Reaction rates for a generalized reaction-diffusion master equationStefan Hellander, Linda Petzold
It has been established that there is an inherent limit to the accuracy of the reaction-diffusion master equation. Specifically, there exists a fundamental lower bound on the mesh size, below which the accuracy deteriorates as the mesh is refined further. In this paper we extend the standard reaction-diffusion master equation to allow molecules occupying neighboring voxels to react, in contrast to the traditional approach in which molecules react only when occupying the same voxel. We derive reaction rates, in two dimensions as well as three dimensions, to obtain an optimal match to the more fine-grained Smoluchowski model, and show in two numerical examples that the extended algorithm is accurate for a wide range of mesh sizes, allowing us to simulate systems intractable with the standard reaction-diffusion master equation. In addition, we show that for mesh sizes above the fundamental lower limit of the standard algorithm, the generalized algorithm reduces to the standard algorithm. We derive a lower limit for the generalized algorithm, which, in both two dimensions and three dimensions, is on the order of the reaction radius of a reacting pair of molecules.