Yuan Luo

LG
h-index37
79papers
5,895citations
Novelty41%
AI Score57

79 Papers

CLJan 27, 2023Code
A Comparative Study of Pretrained Language Models for Long Clinical Text

Yikuan Li, Ramsey M. Wehbe, Faraz S. Ahmad et al.

Objective: Clinical knowledge enriched transformer models (e.g., ClinicalBERT) have state-of-the-art results on clinical NLP (natural language processing) tasks. One of the core limitations of these transformer models is the substantial memory consumption due to their full self-attention mechanism, which leads to the performance degradation in long clinical texts. To overcome this, we propose to leverage long-sequence transformer models (e.g., Longformer and BigBird), which extend the maximum input sequence length from 512 to 4096, to enhance the ability to model long-term dependencies in long clinical texts. Materials and Methods: Inspired by the success of long sequence transformer models and the fact that clinical notes are mostly long, we introduce two domain enriched language models, Clinical-Longformer and Clinical-BigBird, which are pre-trained on a large-scale clinical corpus. We evaluate both language models using 10 baseline tasks including named entity recognition, question answering, natural language inference, and document classification tasks. Results: The results demonstrate that Clinical-Longformer and Clinical-BigBird consistently and significantly outperform ClinicalBERT and other short-sequence transformers in all 10 downstream tasks and achieve new state-of-the-art results. Discussion: Our pre-trained language models provide the bedrock for clinical NLP using long texts. We have made our source code available at https://github.com/luoyuanlab/Clinical-Longformer, and the pre-trained models available for public download at: https://huggingface.co/yikuan8/Clinical-Longformer. Conclusion: This study demonstrates that clinical knowledge enriched long-sequence transformers are able to learn long-term dependencies in long clinical text. Our methods can also inspire the development of other domain-enriched long-sequence transformers.

LGFeb 20, 2023Code
Deep Reinforcement Learning for Cost-Effective Medical Diagnosis

Zheng Yu, Yikuan Li, Joseph Kim et al.

Dynamic diagnosis is desirable when medical tests are costly or time-consuming. In this work, we use reinforcement learning (RL) to find a dynamic policy that selects lab test panels sequentially based on previous observations, ensuring accurate testing at a low cost. Clinical diagnostic data are often highly imbalanced; therefore, we aim to maximize the $F_1$ score instead of the error rate. However, optimizing the non-concave $F_1$ score is not a classic RL problem, thus invalidates standard RL methods. To remedy this issue, we develop a reward shaping approach, leveraging properties of the $F_1$ score and duality of policy optimization, to provably find the set of all Pareto-optimal policies for budget-constrained $F_1$ score maximization. To handle the combinatorially complex state space, we propose a Semi-Model-based Deep Diagnosis Policy Optimization (SM-DDPO) framework that is compatible with end-to-end training and online learning. SM-DDPO is tested on diverse clinical tasks: ferritin abnormality detection, sepsis mortality prediction, and acute kidney injury diagnosis. Experiments with real-world data validate that SM-DDPO trains efficiently and identifies all Pareto-front solutions. Across all tasks, SM-DDPO is able to achieve state-of-the-art diagnosis accuracy (in some cases higher than conventional methods) with up to $85\%$ reduction in testing cost. The code is available at [https://github.com/Zheng321/Deep-Reinforcement-Learning-for-Cost-Effective-Medical-Diagnosis].

CVSep 28, 2023
Rethinking Domain Generalization: Discriminability and Generalizability

Shaocong Long, Qianyu Zhou, Chenhao Ying et al.

Domain generalization(DG) endeavors to develop robust models that possess strong generalizability while preserving excellent discriminability. Nonetheless, pivotal DG techniques tend to improve the feature generalizability by learning domain-invariant representations, inadvertently overlooking the feature discriminability. On the one hand, the simultaneous attainment of generalizability and discriminability of features presents a complex challenge, often entailing inherent contradictions. This challenge becomes particularly pronounced when domain-invariant features manifest reduced discriminability owing to the inclusion of unstable factors, i.e., spurious correlations. On the other hand, prevailing domain-invariant methods can be categorized as category-level alignment, susceptible to discarding indispensable features possessing substantial generalizability and narrowing intra-class variations. To surmount these obstacles, we rethink DG from a new perspective that concurrently imbues features with formidable discriminability and robust generalizability, and present a novel framework, namely, Discriminative Microscopic Distribution Alignment~(DMDA). DMDA incorporates two core components: Selective Channel Pruning~(SCP) and Micro-level Distribution Alignment~(MDA). Concretely, SCP attempts to curtail redundancy within neural networks, prioritizing stable attributes conducive to accurate classification. This approach alleviates the adverse effect of spurious domain invariance and amplifies the feature discriminability. Besides, MDA accentuates micro-level alignment within each class, going beyond mere category-level alignment. Extensive experiments on four benchmark datasets corroborate that DMDA achieves comparable results to state-of-the-art methods in DG, underscoring the efficacy of our method.

LGOct 13, 2022
SHINE: SubHypergraph Inductive Neural nEtwork

Yuan Luo

Hypergraph neural networks can model multi-way connections among nodes of the graphs, which are common in real-world applications such as genetic medicine. In particular, genetic pathways or gene sets encode molecular functions driven by multiple genes, naturally represented as hyperedges. Thus, hypergraph-guided embedding can capture functional relations in learned representations. Existing hypergraph neural network models often focus on node-level or graph-level inference. There is an unmet need in learning powerful representations of subgraphs of hypergraphs in real-world applications. For example, a cancer patient can be viewed as a subgraph of genes harboring mutations in the patient, while all the genes are connected by hyperedges that correspond to pathways representing specific molecular functions. For accurate inductive subgraph prediction, we propose SubHypergraph Inductive Neural nEtwork (SHINE). SHINE uses informative genetic pathways that encode molecular functions as hyperedges to connect genes as nodes. SHINE jointly optimizes the objectives of end-to-end subgraph classification and hypergraph nodes' similarity regularization. SHINE simultaneously learns representations for both genes and pathways using strongly dual attention message passing. The learned representations are aggregated via a subgraph attention layer and used to train a multilayer perceptron for inductive subgraph inferencing. We evaluated SHINE against a wide array of state-of-the-art (hyper)graph neural networks, XGBoost, NMF and polygenic risk score models, using large scale NGS and curated datasets. SHINE outperformed all comparison models significantly, and yielded interpretable disease models with functional insights.

LGApr 10, 2022
Multimodal Machine Learning in Precision Health

Adrienne Kline, Hanyin Wang, Yikuan Li et al.

As machine learning and artificial intelligence are more frequently being leveraged to tackle problems in the health sector, there has been increased interest in utilizing them in clinical decision-support. This has historically been the case in single modal data such as electronic health record data. Attempts to improve prediction and resemble the multimodal nature of clinical expert decision-making this has been met in the computational field of machine learning by a fusion of disparate data. This review was conducted to summarize this field and identify topics ripe for future research. We conducted this review in accordance with the PRISMA (Preferred Reporting Items for Systematic reviews and Meta-Analyses) extension for Scoping Reviews to characterize multi-modal data fusion in health. We used a combination of content analysis and literature searches to establish search strings and databases of PubMed, Google Scholar, and IEEEXplore from 2011 to 2021. A final set of 125 articles were included in the analysis. The most common health areas utilizing multi-modal methods were neurology and oncology. However, there exist a wide breadth of current applications. The most common form of information fusion was early fusion. Notably, there was an improvement in predictive performance performing heterogeneous data fusion. Lacking from the papers were clear clinical deployment strategies and pursuit of FDA-approved tools. These findings provide a map of the current literature on multimodal data fusion as applied to health diagnosis/prognosis problems. Multi-modal machine learning, while more robust in its estimations over unimodal methods, has drawbacks in its scalability and the time-consuming nature of information concatenation.

CVSep 28, 2023
Diverse Target and Contribution Scheduling for Domain Generalization

Shaocong Long, Qianyu Zhou, Chenhao Ying et al.

Generalization under the distribution shift has been a great challenge in computer vision. The prevailing practice of directly employing the one-hot labels as the training targets in domain generalization~(DG) can lead to gradient conflicts, making it insufficient for capturing the intrinsic class characteristics and hard to increase the intra-class variation. Besides, existing methods in DG mostly overlook the distinct contributions of source (seen) domains, resulting in uneven learning from these domains. To address these issues, we firstly present a theoretical and empirical analysis of the existence of gradient conflicts in DG, unveiling the previously unexplored relationship between distribution shifts and gradient conflicts during the optimization process. In this paper, we present a novel perspective of DG from the empirical source domain's risk and propose a new paradigm for DG called Diverse Target and Contribution Scheduling (DTCS). DTCS comprises two innovative modules: Diverse Target Supervision (DTS) and Diverse Contribution Balance (DCB), with the aim of addressing the limitations associated with the common utilization of one-hot labels and equal contributions for source domains in DG. In specific, DTS employs distinct soft labels as training targets to account for various feature distributions across domains and thereby mitigates the gradient conflicts, and DCB dynamically balances the contributions of source domains by ensuring a fair decline in losses of different source domains. Extensive experiments with analysis on four benchmark datasets show that the proposed method achieves a competitive performance in comparison with the state-of-the-art approaches, demonstrating the effectiveness and advantages of the proposed DTCS.

CLNov 7, 2022
AD-BERT: Using Pre-trained contextualized embeddings to Predict the Progression from Mild Cognitive Impairment to Alzheimer's Disease

Chengsheng Mao, Jie Xu, Luke Rasmussen et al.

Objective: We develop a deep learning framework based on the pre-trained Bidirectional Encoder Representations from Transformers (BERT) model using unstructured clinical notes from electronic health records (EHRs) to predict the risk of disease progression from Mild Cognitive Impairment (MCI) to Alzheimer's Disease (AD). Materials and Methods: We identified 3657 patients diagnosed with MCI together with their progress notes from Northwestern Medicine Enterprise Data Warehouse (NMEDW) between 2000-2020. The progress notes no later than the first MCI diagnosis were used for the prediction. We first preprocessed the notes by deidentification, cleaning and splitting, and then pretrained a BERT model for AD (AD-BERT) based on the publicly available Bio+Clinical BERT on the preprocessed notes. The embeddings of all the sections of a patient's notes processed by AD-BERT were combined by MaxPooling to compute the probability of MCI-to-AD progression. For replication, we conducted a similar set of experiments on 2563 MCI patients identified at Weill Cornell Medicine (WCM) during the same timeframe. Results: Compared with the 7 baseline models, the AD-BERT model achieved the best performance on both datasets, with Area Under receiver operating characteristic Curve (AUC) of 0.8170 and F1 score of 0.4178 on NMEDW dataset and AUC of 0.8830 and F1 score of 0.6836 on WCM dataset. Conclusion: We developed a deep learning framework using BERT models which provide an effective solution for prediction of MCI-to-AD progression using clinical note analysis.

CYSep 28, 2024
Environment Scan of Generative AI Infrastructure for Clinical and Translational Science

Betina Idnay, Zihan Xu, William G. Adams et al.

This study reports a comprehensive environmental scan of the generative AI (GenAI) infrastructure in the national network for clinical and translational science across 36 institutions supported by the Clinical and Translational Science Award (CTSA) Program led by the National Center for Advancing Translational Sciences (NCATS) of the National Institutes of Health (NIH) at the United States. With the rapid advancement of GenAI technologies, including large language models (LLMs), healthcare institutions face unprecedented opportunities and challenges. This research explores the current status of GenAI integration, focusing on stakeholder roles, governance structures, and ethical considerations by administering a survey among leaders of health institutions (i.e., representing academic medical centers and health systems) to assess the institutional readiness and approach towards GenAI adoption. Key findings indicate a diverse range of institutional strategies, with most organizations in the experimental phase of GenAI deployment. The study highlights significant variations in governance models, with a strong preference for centralized decision-making but notable gaps in workforce training and ethical oversight. Moreover, the results underscore the need for a more coordinated approach to GenAI governance, emphasizing collaboration among senior leaders, clinicians, information technology staff, and researchers. Our analysis also reveals concerns regarding GenAI bias, data security, and stakeholder trust, which must be addressed to ensure the ethical and effective implementation of GenAI technologies. This study offers valuable insights into the challenges and opportunities of GenAI integration in healthcare, providing a roadmap for institutions aiming to leverage GenAI for improved quality of care and operational efficiency.

LGSep 15, 2023
Deep Reinforcement Learning for Efficient and Fair Allocation of Health Care Resources

Yikuan Li, Chengsheng Mao, Kaixuan Huang et al.

Scarcity of health care resources could result in the unavoidable consequence of rationing. For example, ventilators are often limited in supply, especially during public health emergencies or in resource-constrained health care settings, such as amid the pandemic of COVID-19. Currently, there is no universally accepted standard for health care resource allocation protocols, resulting in different governments prioritizing patients based on various criteria and heuristic-based protocols. In this study, we investigate the use of reinforcement learning for critical care resource allocation policy optimization to fairly and effectively ration resources. We propose a transformer-based deep Q-network to integrate the disease progression of individual patients and the interaction effects among patients during the critical care resource allocation. We aim to improve both fairness of allocation and overall patient outcomes. Our experiments demonstrate that our method significantly reduces excess deaths and achieves a more equitable distribution under different levels of ventilator shortage, when compared to existing severity-based and comorbidity-based methods in use by different governments. Our source code is included in the supplement and will be released on Github upon publication.

CLAug 26, 2023
Exploring Large Language Models for Knowledge Graph Completion

Liang Yao, Jiazhen Peng, Chengsheng Mao et al.

Knowledge graphs play a vital role in numerous artificial intelligence tasks, yet they frequently face the issue of incompleteness. In this study, we explore utilizing Large Language Models (LLM) for knowledge graph completion. We consider triples in knowledge graphs as text sequences and introduce an innovative framework called Knowledge Graph LLM (KG-LLM) to model these triples. Our technique employs entity and relation descriptions of a triple as prompts and utilizes the response for predictions. Experiments on various benchmark knowledge graphs demonstrate that our method attains state-of-the-art performance in tasks such as triple classification and relation prediction. We also find that fine-tuning relatively smaller models (e.g., LLaMA-7B, ChatGLM-6B) outperforms recent ChatGPT and GPT-4.

LGFeb 1, 2023
Using Machine Learning to Develop Smart Reflex Testing Protocols

Matthew McDermott, Anand Dighe, Peter Szolovits et al. · harvard

Objective: Reflex testing protocols allow clinical laboratories to perform second line diagnostic tests on existing specimens based on the results of initially ordered tests. Reflex testing can support optimal clinical laboratory test ordering and diagnosis. In current clinical practice, reflex testing typically relies on simple "if-then" rules; however, this limits their scope since most test ordering decisions involve more complexity than a simple rule will allow. Here, using the analyte ferritin as an example, we propose an alternative machine learning-based approach to "smart" reflex testing with a wider scope and greater impact than traditional rule-based approaches. Methods: Using patient data, we developed a machine learning model to predict whether a patient getting CBC testing will also have ferritin testing ordered, consider applications of this model to "smart" reflex testing, and evaluate the model by comparing its performance to possible rule-based approaches. Results: Our underlying machine learning models performed moderately well in predicting ferritin test ordering and demonstrated greater suitability to reflex testing than rule-based approaches. Using chart review, we demonstrate that our model may improve ferritin test ordering. Finally, as a secondary goal, we demonstrate that ferritin test results are missing not at random (MNAR), a finding with implications for unbiased imputation of missing test results. Conclusions: Machine learning may provide a foundation for new types of reflex testing with enhanced benefits for clinical diagnosis and laboratory utilization management.

IVApr 20, 2023
Medical Image Deidentification, Cleaning and Compression Using Pylogik

Adrienne Kline, Vinesh Appadurai, Yuan Luo et al.

Leveraging medical record information in the era of big data and machine learning comes with the caveat that data must be cleaned and de-identified. Facilitating data sharing and harmonization for multi-center collaborations are particularly difficult when protected health information (PHI) is contained or embedded in image meta-data. We propose a novel library in the Python framework, called PyLogik, to help alleviate this issue for ultrasound images, which are particularly challenging because of the frequent inclusion of PHI directly on the images. PyLogik processes the image volumes through a series of text detection/extraction, filtering, thresholding, morphological and contour comparisons. This methodology de-identifies the images, reduces file sizes, and prepares image volumes for applications in deep learning and data sharing. To evaluate its effectiveness in processing ultrasound data, a random sample of 50 cardiac ultrasounds (echocardiograms) were processed through PyLogik, and the outputs were compared with the manual segmentations by an expert user. The Dice coefficient of the two approaches achieved an average value of 0.976. Next, an investigation was conducted to ascertain the degree of information compression achieved using the algorithm. Resultant data was found to be on average ~72% smaller after processing by PyLogik. Our results suggest that PyLogik is a viable methodology for data cleaning and de-identification, determining ROI, and file compression which will facilitate efficient storage, use, and dissemination of ultrasound data. Variants of the pipeline have also been created for use with other medical imaging data types.

CLJul 5, 2022
Deep Learning Reveals Patterns of Diverse and Changing Sentiments Towards COVID-19 Vaccines Based on 11 Million Tweets

Hanyin Wang, Meghan R. Hutch, Yikuan Li et al.

Over 12 billion doses of COVID-19 vaccines have been administered at the time of writing. However, public perceptions of vaccines have been complex. We analyzed COVID-19 vaccine-related tweets to understand the evolving perceptions of COVID-19 vaccines. We finetuned a deep learning classifier using a state-of-the-art model, XLNet, to detect each tweet's sentiment automatically. We employed validated methods to extract the users' race or ethnicity, gender, age, and geographical locations from user profiles. Incorporating multiple data sources, we assessed the sentiment patterns among subpopulations and juxtaposed them against vaccine uptake data to unravel their interactive patterns. 11,211,672 COVID-19 vaccine-related tweets corresponding to 2,203,681 users over two years were analyzed. The finetuned model for sentiment classification yielded an accuracy of 0.92 on testing set. Users from various demographic groups demonstrated distinct patterns in sentiments towards COVID-19 vaccines. User sentiments became more positive over time, upon which we observed subsequent upswing in the population-level vaccine uptake. Surrounding dates where positive sentiments crest, we detected encouraging news or events regarding vaccine development and distribution. Positive sentiments in pregnancy-related tweets demonstrated a delayed pattern compared with trends in general population, with postponed vaccine uptake trends. Distinctive patterns across subpopulations suggest the need of tailored strategies. Global news and events profoundly involved in shaping users' thoughts on social media. Populations with additional concerns, such as pregnancy, demonstrated more substantial hesitancy since lack of timely recommendations. Feature analysis revealed hesitancies of various subpopulations stemmed from clinical trial logics, risks and complications, and urgency of scientific evidence.

CLMay 7, 2022
AKI-BERT: a Pre-trained Clinical Language Model for Early Prediction of Acute Kidney Injury

Chengsheng Mao, Liang Yao, Yuan Luo

Acute kidney injury (AKI) is a common clinical syndrome characterized by a sudden episode of kidney failure or kidney damage within a few hours or a few days. Accurate early prediction of AKI for patients in ICU who are more likely than others to have AKI can enable timely interventions, and reduce the complications of AKI. Much of the clinical information relevant to AKI is captured in clinical notes that are largely unstructured text and requires advanced natural language processing (NLP) for useful information extraction. On the other hand, pre-trained contextual language models such as Bidirectional Encoder Representations from Transformers (BERT) have improved performances for many NLP tasks in general domain recently. However, few have explored BERT on disease-specific medical domain tasks such as AKI early prediction. In this paper, we try to apply BERT to specific diseases and present an AKI domain-specific pre-trained language model based on BERT (AKI-BERT) that could be used to mine the clinical notes for early prediction of AKI. AKI-BERT is a BERT model pre-trained on the clinical notes of patients having risks for AKI. Our experiments on Medical Information Mart for Intensive Care III (MIMIC-III) dataset demonstrate that AKI-BERT can yield performance improvements for early AKI prediction, thus expanding the utility of the BERT model from general clinical domain to disease-specific domain.

LGMar 5, 2022
Machine Learning Applications in Lung Cancer Diagnosis, Treatment and Prognosis

Yawei Li, Xin Wu, Ping Yang et al.

The recent development of imaging and sequencing technologies enables systematic advances in the clinical study of lung cancer. Meanwhile, the human mind is limited in effectively handling and fully utilizing the accumulation of such enormous amounts of data. Machine learning-based approaches play a critical role in integrating and analyzing these large and complex datasets, which have extensively characterized lung cancer through the use of different perspectives from these accrued data. In this article, we provide an overview of machine learning-based approaches that strengthen the varying aspects of lung cancer diagnosis and therapy, including early detection, auxiliary diagnosis, prognosis prediction and immunotherapy practice. Moreover, we highlight the challenges and opportunities for future applications of machine learning in lung cancer.

CLSep 19, 2023
Enhancing Health Data Interoperability with Large Language Models: A FHIR Study

Yikuan Li, Hanyin Wang, Halid Yerebakan et al.

In this study, we investigated the ability of the large language model (LLM) to enhance healthcare data interoperability. We leveraged the LLM to convert clinical texts into their corresponding FHIR resources. Our experiments, conducted on 3,671 snippets of clinical text, demonstrated that the LLM not only streamlines the multi-step natural language processing and human calibration processes but also achieves an exceptional accuracy rate of over 90% in exact matches when compared to human annotations.

74.8GTMay 21
Joint Communication and Computation Scheduling for MEC-enabled AIGC Services: A Game-Theoretic Stochastic Learning Approach

Huaizhe Liu, Xinyi Zhuang, Jiaqi Wu et al.

Artificial Intelligence Generated Content (AIGC) powered by Generative Diffusion Models (GDMs) has emerged as a transformative paradigm for automated content creation. To satisfy the stringent latency requirements of AIGC services in many edge intelligence scenarios (e.g., smart cities), Mobile Edge Computing (MEC) provides critical computational support by deploying GDMs at edge servers (ES) close to end users. This paper investigates an MEC-enabled AIGC network comprising multiple ES, wireless access points (APs), and mobile users (UEs) with heterogeneous latency and accuracy demands. We formulate a Joint Communication Association and Computation Offloading (JCACO) game, where each UE strategically selects its serving AP, ES, and inference steps to minimize the overall service completion time while meeting accuracy constraints. The problem is challenging due to the network dynamics and the incomplete information. We prove that the JCACO game is a potential game under both complete and stochastic information settings, ensuring the existence of Nash Equilibrium (NE) in both cases. To derive the NE efficiently, we develop a distributed Multi-Agent Stochastic Learning (MASL) algorithm that provably converges to the NE with strict performance guarantees. Unlike conventional best-response schemes, MASL requires neither the knowledge of other players' strategies nor global network information, making it fully distributed and adaptive to dynamic environments. We further provide a strict theoretical convergence analysis for MASL by using Ordinary Differential Equations (ODEs). Simulation results demonstrate that MASL significantly reduces service completion time compared with benchmark methods while satisfying accuracy constraints, confirming its effectiveness and practicality for real-world MEC-enabled AIGC networks.

97.3SEApr 21Code
PlayCoder: Making LLM-Generated GUI Code Playable

Zhiyuan Peng, Wei Tao, Xin Yin et al.

Large language models (LLMs) have achieved strong results in code generation, but their ability to generate GUI applications, especially games, remains insufficiently studied. Existing benchmarks mainly evaluate correctness through test cases, which are inadequate for GUI applications because these systems are interactive, event-driven, and require correct state transitions across sequences of user actions. Their evaluation therefore should consider interaction flows and UI logic rather than only pass/fail outcomes. To study this problem, we introduce PlayEval, a repository-aware benchmark built from 43 multilingual GUI applications in Python, TypeScript, and JavaScript. Unlike prior GUI benchmarks that are difficult to adapt to desktop environments, PlayEval covers six major GUI application categories and directly supports code-generation evaluation. We further propose Play@k, a metric that measures whether at least one of *k* generated candidates can be played end-to-end without logical errors. To support reliable evaluation, we develop PlayTester, an LLM-based agent that performs task-oriented GUI playthroughs and detects logic violations automatically. Experiments on 10 state-of-the-art code LLMs show that, despite high compilation rates, they achieve near-zero Play@3, revealing major weaknesses in generating logically correct GUI applications. To address this limitation, we present PlayCoder, a multi-agent, repository-aware framework that generates, evaluates, and iteratively repairs GUI application code in a closed loop. PlayCoder substantially improves both functional correctness and semantic alignment for open-source and closed-source models, reaching up to 38.1% Exec@3 and 20.3% Play@3. Case studies further show that it can uncover silent logic bugs missed by traditional metrics and fix them through targeted edits.

MLFeb 8, 2023
IRTCI: Item Response Theory for Categorical Imputation

Adrienne Kline, Yuan Luo

Most datasets suffer from partial or complete missing values, which has downstream limitations on the available models on which to test the data and on any statistical inferences that can be made from the data. Several imputation techniques have been designed to replace missing data with stand in values. The various approaches have implications for calculating clinical scores, model building and model testing. The work showcased here offers a novel means for categorical imputation based on item response theory (IRT) and compares it against several methodologies currently used in the machine learning field including k-nearest neighbors (kNN), multiple imputed chained equations (MICE) and Amazon Web Services (AWS) deep learning method, Datawig. Analyses comparing these techniques were performed on three different datasets that represented ordinal, nominal and binary categories. The data were modified so that they also varied on both the proportion of data missing and the systematization of the missing data. Two different assessments of performance were conducted: accuracy in reproducing the missing values, and predictive performance using the imputed data. Results demonstrated that the new method, Item Response Theory for Categorical Imputation (IRTCI), fared quite well compared to currently used methods, outperforming several of them in many conditions. Given the theoretical basis for the new approach, and the unique generation of probabilistic terms for determining category belonging for missing cells, IRTCI offers a viable alternative to current approaches.

CVApr 11, 2024Code
DGMamba: Domain Generalization via Generalized State Space Model

Shaocong Long, Qianyu Zhou, Xiangtai Li et al.

Domain generalization~(DG) aims at solving distribution shift problems in various scenes. Existing approaches are based on Convolution Neural Networks (CNNs) or Vision Transformers (ViTs), which suffer from limited receptive fields or quadratic complexities issues. Mamba, as an emerging state space model (SSM), possesses superior linear complexity and global receptive fields. Despite this, it can hardly be applied to DG to address distribution shifts, due to the hidden state issues and inappropriate scan mechanisms. In this paper, we propose a novel framework for DG, named DGMamba, that excels in strong generalizability toward unseen domains and meanwhile has the advantages of global receptive fields, and efficient linear complexity. Our DGMamba compromises two core components: Hidden State Suppressing~(HSS) and Semantic-aware Patch refining~(SPR). In particular, HSS is introduced to mitigate the influence of hidden states associated with domain-specific features during output prediction. SPR strives to encourage the model to concentrate more on objects rather than context, consisting of two designs: Prior-Free Scanning~(PFS), and Domain Context Interchange~(DCI). Concretely, PFS aims to shuffle the non-semantic patches within images, creating more flexible and effective sequences from images, and DCI is designed to regularize Mamba with the combination of mismatched non-semantic and semantic information by fusing patches among domains. Extensive experiments on five commonly used DG benchmarks demonstrate that the proposed DGMamba achieves remarkably superior results to state-of-the-art models. The code will be made publicly available at https://github.com/longshaocong/DGMamba.

64.4GTMar 28
Recruiting Heterogeneous Crowdsource Vehicles for Updating a High-definition Map

Wentao Ye, Yuan Luo, Bo Liu et al.

The high-definition map is a cornerstone of autonomous driving. Unlike constructing a costly fleet of mapping vehicles, the crowdsourcing paradigm is a cost-effective way to keep an HD map up to date. Achieving practical success for crowdsourcing-based HD maps is contingent on addressing two critical issues: freshness and recruitment costs. Given that crowdsource vehicles are often heterogeneous in terms of operational costs and sensing capabilities, it is practical to recruit heterogeneous crowdsource vehicles to achieve the tradeoff between freshness and recruitment costs. However, existing works neglect this aspect. To solve it, we formulate this problem as a Markov decision process. We demonstrate that the optimal policy is threshold-type age-dependent. Additionally, our findings reveal some counter-intuitive insights. In some cases, the company should initiate vehicle recruitment earlier when vehicles arrive more frequently, or have higher operational costs or sensing capabilities.} Besides, we propose an efficient algorithm, called the bound-based relative value iteration (BRVI) algorithm, to overcome the technical challenge that finding an optimal policy is time-consuming. Numerical simulations show that (i) the optimal policy reduces the average cost by $19.04\%$ compared to the state-of-the-art mechanism}, and (ii) the proposed algorithm can reduce the convergence time by $13.66\%$ on average compared to the existing algorithm.

CLSep 26, 2025Code
AMANDA: Agentic Medical Knowledge Augmentation for Data-Efficient Medical Visual Question Answering

Ziqing Wang, Chengsheng Mao, Xiaole Wen et al.

Medical Multimodal Large Language Models (Med-MLLMs) have shown great promise in medical visual question answering (Med-VQA). However, when deployed in low-resource settings where abundant labeled data are unavailable, existing Med-MLLMs commonly fail due to their medical reasoning capability bottlenecks: (i) the intrinsic reasoning bottleneck that ignores the details from the medical image; (ii) the extrinsic reasoning bottleneck that fails to incorporate specialized medical knowledge. To address those limitations, we propose AMANDA, a training-free agentic framework that performs medical knowledge augmentation via LLM agents. Specifically, our intrinsic medical knowledge augmentation focuses on coarse-to-fine question decomposition for comprehensive diagnosis, while extrinsic medical knowledge augmentation grounds the reasoning process via biomedical knowledge graph retrieval. Extensive experiments across eight Med-VQA benchmarks demonstrate substantial improvements in both zero-shot and few-shot Med-VQA settings. The code is available at https://github.com/REAL-Lab-NU/AMANDA.

CLJan 27, 2022Code
Clinical-Longformer and Clinical-BigBird: Transformers for long clinical sequences

Yikuan Li, Ramsey M. Wehbe, Faraz S. Ahmad et al.

Transformers-based models, such as BERT, have dramatically improved the performance for various natural language processing tasks. The clinical knowledge enriched model, namely ClinicalBERT, also achieved state-of-the-art results when performed on clinical named entity recognition and natural language inference tasks. One of the core limitations of these transformers is the substantial memory consumption due to their full self-attention mechanism. To overcome this, long sequence transformer models, e.g. Longformer and BigBird, were proposed with the idea of sparse attention mechanism to reduce the memory usage from quadratic to the sequence length to a linear scale. These models extended the maximum input sequence length from 512 to 4096, which enhanced the ability of modeling long-term dependency and consequently achieved optimal results in a variety of tasks. Inspired by the success of these long sequence transformer models, we introduce two domain enriched language models, namely Clinical-Longformer and Clinical-BigBird, which are pre-trained from large-scale clinical corpora. We evaluate both pre-trained models using 10 baseline tasks including named entity recognition, question answering, and document classification tasks. The results demonstrate that Clinical-Longformer and Clinical-BigBird consistently and significantly outperform ClinicalBERT as well as other short-sequence transformers in all downstream tasks. We have made our source code available at [https://github.com/luoyuanlab/Clinical-Longformer] the pre-trained models available for public download at: [https://huggingface.co/yikuan8/Clinical-Longformer].

CVMar 31, 2019Code
ImageGCN: Multi-Relational Image Graph Convolutional Networks for Disease Identification with Chest X-rays

Chengsheng Mao, Liang Yao, Yuan Luo

Image representation is a fundamental task in computer vision. However, most of the existing approaches for image representation ignore the relations between images and consider each input image independently. Intuitively, relations between images can help to understand the images and maintain model consistency over related images, leading to better explainability. In this paper, we consider modeling the image-level relations to generate more informative image representations, and propose ImageGCN, an end-to-end graph convolutional network framework for inductive multi-relational image modeling. We apply ImageGCN to chest X-ray images where rich relational information is available for disease identification. Unlike previous image representation models, ImageGCN learns the representation of an image using both its original pixel features and its relationship with other images. Besides learning informative representations for images, ImageGCN can also be used for object detection in a weakly supervised manner. The experimental results on 3 open-source x-ray datasets, ChestX-ray14, CheXpert and MIMIC-CXR demonstrate that ImageGCN can outperform respective baselines in both disease identification and localization tasks and can achieve comparable and often better results than the state-of-the-art methods.

LGNov 4, 2023
Machine learning's own Industrial Revolution

Yuan Luo, Song Han, Jingjing Liu

Machine learning is expected to enable the next Industrial Revolution. However, lacking standardized and automated assembly networks, ML faces significant challenges to meet ever-growing enterprise demands and empower broad industries. In the Perspective, we argue that ML needs to first complete its own Industrial Revolution, elaborate on how to best achieve its goals, and discuss new opportunities to enable rapid translation from ML's innovation frontier to mass production and utilization.

76.2GTMar 28
Efficient and Cost-effective Vehicle Recruitment for HD Map Crowdsourcing

Wentao Ye, Yuan Luo, Bo Liu et al.

The high-definition (HD) map is a cornerstone of autonomous driving. The crowdsourcing paradigm is a cost-effective way to keep an HD map up-to-date. Current HD map crowdsourcing mechanisms aim to enhance HD map freshness within recruitment budgets. However, many overlook unique and critical traits of crowdsourcing vehicles, such as random arrival and heterogeneity, leading to either compromised map freshness or excessive recruitment costs. Furthermore, these characteristics complicate the characterization of the feasible space of the optimal recruitment policy, necessitating a method to compute it efficiently in dynamic transportation scenarios.To overcome these challenges, we propose an efficient and cost-effective vehicle recruitment (ENTER) mechanism. Specifically, the ENTER mechanism has a threshold structure and balances freshness with recruitment costs while accounting for the vehicles' random arrival and heterogeneity. It also integrates the bound-based relative value iteration (RVI) algorithm, which utilizes the threshold-type structure and upper bounds of thresholds to reduce the feasible space and expedite convergence. Numerical results show that the proposed ENTER mechanism increases the HD map company's payoff by 23.40% and 43.91% compared to state-of-the-art mechanisms that do not account for vehicle heterogeneity and random arrivals, respectively. Furthermore, the bound-based RVI algorithm in the ENTER mechanism reduces computation time by an average of 18.91% compared to the leading RVI-based algorithm.

GTJan 8
Mechanism Design for Federated Learning with Non-Monotonic Network Effects

Xiang Li, Bing Luo, Jianwei Huang et al.

Mechanism design is pivotal to federated learning (FL) for maximizing social welfare by coordinating self-interested clients. Existing mechanisms, however, often overlook the network effects of client participation and the diverse model performance requirements (i.e., generalization error) across applications, leading to suboptimal incentives and social welfare, or even inapplicability in real deployments. To address this gap, we explore incentive mechanism design for FL with network effects and application-specific requirements of model performance. We develop a theoretical model to quantify the impact of network effects on heterogeneous client participation, revealing the non-monotonic nature of such effects. Based on these insights, we propose a Model Trading and Sharing (MoTS) framework, which enables clients to obtain FL models through either participation or purchase. To further address clients' strategic behaviors, we design a Social Welfare maximization with Application-aware and Network effects (SWAN) mechanism, exploiting model customer payments for incentivization. Experimental results on a hardware prototype demonstrate that our SWAN mechanism outperforms existing FL mechanisms, improving social welfare by up to $352.42\%$ and reducing extra incentive costs by $93.07\%$.

CVApr 16, 2025
RADLER: Radar Object Detection Leveraging Semantic 3D City Models and Self-Supervised Radar-Image Learning

Yuan Luo, Rudolf Hoffmann, Yan Xia et al.

Semantic 3D city models are worldwide easy-accessible, providing accurate, object-oriented, and semantic-rich 3D priors. To date, their potential to mitigate the noise impact on radar object detection remains under-explored. In this paper, we first introduce a unique dataset, RadarCity, comprising 54K synchronized radar-image pairs and semantic 3D city models. Moreover, we propose a novel neural network, RADLER, leveraging the effectiveness of contrastive self-supervised learning (SSL) and semantic 3D city models to enhance radar object detection of pedestrians, cyclists, and cars. Specifically, we first obtain the robust radar features via a SSL network in the radar-image pretext task. We then use a simple yet effective feature fusion strategy to incorporate semantic-depth features from semantic 3D city models. Having prior 3D information as guidance, RADLER obtains more fine-grained details to enhance radar object detection. We extensively evaluate RADLER on the collected RadarCity dataset and demonstrate average improvements of 5.46% in mean avarage precision (mAP) and 3.51% in mean avarage recall (mAR) over previous radar object detection methods. We believe this work will foster further research on semantic-guided and map-supported radar object detection. Our project page is publicly available athttps://gpp-communication.github.io/RADLER .

CVMay 27, 2025
Temporal Saliency-Guided Distillation: A Scalable Framework for Distilling Video Datasets

Xulin Gu, Xinhao Zhong, Zhixing Wei et al.

Dataset distillation (DD) has emerged as a powerful paradigm for dataset compression, enabling the synthesis of compact surrogate datasets that approximate the training utility of large-scale ones. While significant progress has been achieved in distilling image datasets, extending DD to the video domain remains challenging due to the high dimensionality and temporal complexity inherent in video data. Existing video distillation (VD) methods often suffer from excessive computational costs and struggle to preserve temporal dynamics, as naïve extensions of image-based approaches typically lead to degraded performance. In this paper, we propose a novel uni-level video dataset distillation framework that directly optimizes synthetic videos with respect to a pre-trained model. To address temporal redundancy and enhance motion preservation, we introduce a temporal saliency-guided filtering mechanism that leverages inter-frame differences to guide the distillation process, encouraging the retention of informative temporal cues while suppressing frame-level redundancy. Extensive experiments on standard video benchmarks demonstrate that our method achieves state-of-the-art performance, bridging the gap between real and distilled video data and offering a scalable solution for video dataset compression.

CVApr 9, 2025
Domain Generalization via Discrete Codebook Learning

Shaocong Long, Qianyu Zhou, Xikun Jiang et al.

Domain generalization (DG) strives to address distribution shifts across diverse environments to enhance model's generalizability. Current DG approaches are confined to acquiring robust representations with continuous features, specifically training at the pixel level. However, this DG paradigm may struggle to mitigate distribution gaps in dealing with a large space of continuous features, rendering it susceptible to pixel details that exhibit spurious correlations or noise. In this paper, we first theoretically demonstrate that the domain gaps in continuous representation learning can be reduced by the discretization process. Based on this inspiring finding, we introduce a novel learning paradigm for DG, termed Discrete Domain Generalization (DDG). DDG proposes to use a codebook to quantize the feature map into discrete codewords, aligning semantic-equivalent information in a shared discrete representation space that prioritizes semantic-level information over pixel-level intricacies. By learning at the semantic level, DDG diminishes the number of latent features, optimizing the utilization of the representation space and alleviating the risks associated with the wide-ranging space of continuous features. Extensive experiments across widely employed benchmarks in DG demonstrate DDG's superior performance compared to state-of-the-art approaches, underscoring its potential to reduce the distribution gaps and enhance the model's generalizability.

CVJun 15, 2025
Intriguing Frequency Interpretation of Adversarial Robustness for CNNs and ViTs

Lu Chen, Han Yang, Hu Wang et al.

Adversarial examples have attracted significant attention over the years, yet understanding their frequency-based characteristics remains insufficient. In this paper, we investigate the intriguing properties of adversarial examples in the frequency domain for the image classification task, with the following key findings. (1) As the high-frequency components increase, the performance gap between adversarial and natural examples becomes increasingly pronounced. (2) The model performance against filtered adversarial examples initially increases to a peak and declines to its inherent robustness. (3) In Convolutional Neural Networks, mid- and high-frequency components of adversarial examples exhibit their attack capabilities, while in Transformers, low- and mid-frequency components of adversarial examples are particularly effective. These results suggest that different network architectures have different frequency preferences and that differences in frequency components between adversarial and natural examples may directly influence model robustness. Based on our findings, we further conclude with three useful proposals that serve as a valuable reference to the AI model security community.

CVApr 3, 2025
Generative Classifier for Domain Generalization

Shaocong Long, Qianyu Zhou, Xiangtai Li et al.

Domain generalization (DG) aims to improve the generalizability of computer vision models toward distribution shifts. The mainstream DG methods focus on learning domain invariance, however, such methods overlook the potential inherent in domain-specific information. While the prevailing practice of discriminative linear classifier has been tailored to domain-invariant features, it struggles when confronted with diverse domain-specific information, e.g., intra-class shifts, that exhibits multi-modality. To address these issues, we explore the theoretical implications of relying on domain invariance, revealing the crucial role of domain-specific information in mitigating the target risk for DG. Drawing from these insights, we propose Generative Classifier-driven Domain Generalization (GCDG), introducing a generative paradigm for the DG classifier based on Gaussian Mixture Models (GMMs) for each class across domains. GCDG consists of three key modules: Heterogeneity Learning Classifier~(HLC), Spurious Correlation Blocking~(SCB), and Diverse Component Balancing~(DCB). Concretely, HLC attempts to model the feature distributions and thereby capture valuable domain-specific information via GMMs. SCB identifies the neural units containing spurious correlations and perturbs them, mitigating the risk of HLC learning spurious patterns. Meanwhile, DCB ensures a balanced contribution of components in HLC, preventing the underestimation or neglect of critical components. In this way, GCDG excels in capturing the nuances of domain-specific information characterized by diverse distributions. GCDG demonstrates the potential to reduce the target risk and encourage flat minima, improving the generalizability. Extensive experiments show GCDG's comparable performance on five DG benchmarks and one face anti-spoofing dataset, seamlessly integrating into existing DG methods with consistent improvements.

AIMar 23, 2025
Strategic Prompt Pricing for AIGC Services: A User-Centric Approach

Xiang Li, Bing Luo, Jianwei Huang et al.

The rapid growth of AI-generated content (AIGC) services has created an urgent need for effective prompt pricing strategies, yet current approaches overlook users' strategic two-step decision-making process in selecting and utilizing generative AI models. This oversight creates two key technical challenges: quantifying the relationship between user prompt capabilities and generation outcomes, and optimizing platform payoff while accounting for heterogeneous user behaviors. We address these challenges by introducing prompt ambiguity, a theoretical framework that captures users' varying abilities in prompt engineering, and developing an Optimal Prompt Pricing (OPP) algorithm. Our analysis reveals a counterintuitive insight: users with higher prompt ambiguity (i.e., lower capability) exhibit non-monotonic prompt usage patterns, first increasing then decreasing with ambiguity levels, reflecting complex changes in marginal utility. Experimental evaluation using a character-level GPT-like model demonstrates that our OPP algorithm achieves up to 31.72% improvement in platform payoff compared to existing pricing mechanisms, validating the importance of user-centric prompt pricing in AIGC services.

LGOct 17, 2024
Hiformer: Hybrid Frequency Feature Enhancement Inverted Transformer for Long-Term Wind Power Prediction

Chongyang Wan, Shunbo Lei, Yuan Luo

The increasing severity of climate change necessitates an urgent transition to renewable energy sources, making the large-scale adoption of wind energy crucial for mitigating environmental impact. However, the inherent uncertainty of wind power poses challenges for grid stability, underscoring the need for accurate wind energy prediction models to enable effective power system planning and operation. While many existing studies on wind power prediction focus on short-term forecasting, they often overlook the importance of long-term predictions. Long-term wind power forecasting is essential for effective power grid dispatch and market transactions, as it requires careful consideration of weather features such as wind speed and direction, which directly influence power output. Consequently, methods designed for short-term predictions may lead to inaccurate results and high computational costs in long-term settings. To adress these limitations, we propose a novel approach called Hybrid Frequency Feature Enhancement Inverted Transformer (Hiformer). Hiformer introduces a unique structure that integrates signal decomposition technology with weather feature extraction technique to enhance the modeling of correlations between meteorological conditions and wind power generation. Additionally, Hiformer employs an encoder-only architecture, which reduces the computational complexity associated with long-term wind power forecasting. Compared to the state-of-the-art methods, Hiformer: (i) can improve the prediction accuracy by up to 52.5\%; and (ii) can reduce computational time by up to 68.5\%.

LGApr 18, 2024
Privacy-Preserving UCB Decision Process Verification via zk-SNARKs

Xikun Jiang, He Lyu, Chenhao Ying et al.

With the increasingly widespread application of machine learning, how to strike a balance between protecting the privacy of data and algorithm parameters and ensuring the verifiability of machine learning has always been a challenge. This study explores the intersection of reinforcement learning and data privacy, specifically addressing the Multi-Armed Bandit (MAB) problem with the Upper Confidence Bound (UCB) algorithm. We introduce zkUCB, an innovative algorithm that employs the Zero-Knowledge Succinct Non-Interactive Argument of Knowledge (zk-SNARKs) to enhance UCB. zkUCB is carefully designed to safeguard the confidentiality of training data and algorithmic parameters, ensuring transparent UCB decision-making. Experiments highlight zkUCB's superior performance, attributing its enhanced reward to judicious quantization bit usage that reduces information entropy in the decision-making process. zkUCB's proof size and verification time scale linearly with the execution steps of zkUCB. This showcases zkUCB's adept balance between data security and operational efficiency. This approach contributes significantly to the ongoing discourse on reinforcing data privacy in complex decision-making processes, offering a promising solution for privacy-sensitive applications.

LGFeb 3, 2024
Contrasting Adversarial Perturbations: The Space of Harmless Perturbations

Lu Chen, Shaofeng Li, Benhao Huang et al.

Existing works have extensively studied adversarial examples, which are minimal perturbations that can mislead the output of deep neural networks (DNNs) while remaining imperceptible to humans. However, in this work, we reveal the existence of a harmless perturbation space, in which perturbations drawn from this space, regardless of their magnitudes, leave the network output unchanged when applied to inputs. Essentially, the harmless perturbation space emerges from the usage of non-injective functions (linear or non-linear layers) within DNNs, enabling multiple distinct inputs to be mapped to the same output. For linear layers with input dimensions exceeding output dimensions, any linear combination of the orthogonal bases of the nullspace of the parameter consistently yields no change in their output. For non-linear layers, the harmless perturbation space may expand, depending on the properties of the layers and input samples. Inspired by this property of DNNs, we solve for a family of general perturbation spaces that are redundant for the DNN's decision, and can be used to hide sensitive data and serve as a means of model identification. Our work highlights the distinctive robustness of DNNs (i.e., consistency under large magnitude perturbations) in contrast to adversarial examples (vulnerability for small imperceptible noises).

LGFeb 27, 2022
Distribution Preserving Graph Representation Learning

Chengsheng Mao, Yuan Luo

Graph neural network (GNN) is effective to model graphs for distributed representations of nodes and an entire graph. Recently, research on the expressive power of GNN attracted growing attention. A highly-expressive GNN has the ability to generate discriminative graph representations. However, in the end-to-end training process for a certain graph learning task, a highly-expressive GNN risks generating graph representations overfitting the training data for the target task, while losing information important for the model generalization. In this paper, we propose Distribution Preserving GNN (DP-GNN) - a GNN framework that can improve the generalizability of expressive GNN models by preserving several kinds of distribution information in graph representations and node representations. Besides the generalizability, by applying an expressive GNN backbone, DP-GNN can also have high expressive power. We evaluate the proposed DP-GNN framework on multiple benchmark datasets for graph classification tasks. The experimental results demonstrate that our model achieves state-of-the-art performances.

LGJan 31, 2022
Topology-Preserving Dimensionality Reduction via Interleaving Optimization

Bradley J. Nelson, Yuan Luo

Dimensionality reduction techniques are powerful tools for data preprocessing and visualization which typically come with few guarantees concerning the topological correctness of an embedding. The interleaving distance between the persistent homology of Vietoris-Rips filtrations can be used to identify a scale at which topological features such as clusters or holes in an embedding and original data set are in correspondence. We show how optimization seeking to minimize the interleaving distance can be incorporated into dimensionality reduction algorithms, and explicitly demonstrate its use in finding an optimal linear projection. We demonstrate the utility of this framework to data visualization.

LGJan 9, 2022
Open-Set Recognition of Breast Cancer Treatments

Alexander Cao, Diego Klabjan, Yuan Luo

Open-set recognition generalizes a classification task by classifying test samples as one of the known classes from training or "unknown." As novel cancer drug cocktails with improved treatment are continually discovered, predicting cancer treatments can naturally be formulated in terms of an open-set recognition problem. Drawbacks, due to modeling unknown samples during training, arise from straightforward implementations of prior work in healthcare open-set learning. Accordingly, we reframe the problem methodology and apply a recent existing Gaussian mixture variational autoencoder model, which achieves state-of-the-art results for image datasets, to breast cancer patient data. Not only do we obtain more accurate and robust classification results, with a 24.5% average F1 increase compared to a recent method, but we also reexamine open-set recognition in terms of deployability to a clinical setting.

LGDec 24, 2021
Constrained tensor factorization for computational phenotyping and mortality prediction in patients with cancer

Francisco Y Cai, Chengsheng Mao, Yuan Luo

Background: The increasing adoption of electronic health records (EHR) across the US has created troves of computable data, to which machine learning methods have been applied to extract useful insights. EHR data, represented as a three-dimensional analogue of a matrix (tensor), is decomposed into two-dimensional factors that can be interpreted as computational phenotypes. Methods: We apply constrained tensor factorization to derive computational phenotypes and predict mortality in cohorts of patients with breast, prostate, colorectal, or lung cancer in the Northwestern Medicine Enterprise Data Warehouse from 2000 to 2015. In our experiments, we examined using a supervised term in the factorization algorithm, filtering tensor co-occurrences by medical indication, and incorporating additional social determinants of health (SDOH) covariates in the factorization process. We evaluated the resulting computational phenotypes qualitatively and by assessing their ability to predict five-year mortality using the area under the curve (AUC) statistic. Results: Filtering by medical indication led to more concise and interpretable phenotypes. Mortality prediction performance (AUC) varied under the different experimental conditions and by cancer type (breast: 0.623 - 0.694, prostate: 0.603 - 0.750, colorectal: 0.523 - 0.641, and lung: 0.517 - 0.623). Generally, prediction performance improved with the use of a supervised term and the incorporation of SDOH covariates. Conclusion: Constrained tensor factorization, applied to sparse EHR data of patients with cancer, can discover computational phenotypes predictive of five-year mortality. The incorporation of SDOH variables into the factorization algorithm is an easy-to-implement and effective way to improve prediction performance.

LGDec 20, 2021
Natural language processing to identify lupus nephritis phenotype in electronic health records

Yu Deng, Jennifer A. Pacheco, Anh Chung et al.

Systemic lupus erythematosus (SLE) is a rare autoimmune disorder characterized by an unpredictable course of flares and remission with diverse manifestations. Lupus nephritis, one of the major disease manifestations of SLE for organ damage and mortality, is a key component of lupus classification criteria. Accurately identifying lupus nephritis in electronic health records (EHRs) would therefore benefit large cohort observational studies and clinical trials where characterization of the patient population is critical for recruitment, study design, and analysis. Lupus nephritis can be recognized through procedure codes and structured data, such as laboratory tests. However, other critical information documenting lupus nephritis, such as histologic reports from kidney biopsies and prior medical history narratives, require sophisticated text processing to mine information from pathology reports and clinical notes. In this study, we developed algorithms to identify lupus nephritis with and without natural language processing (NLP) using EHR data. We developed four algorithms: a rule-based algorithm using only structured data (baseline algorithm) and three algorithms using different NLP models. The three NLP models are based on regularized logistic regression and use different sets of features including positive mention of concept unique identifiers (CUIs), number of appearances of CUIs, and a mixture of three components respectively. The baseline algorithm and the best performed NLP algorithm were external validated on a dataset from Vanderbilt University Medical Center (VUMC). Our best performing NLP model incorporating features from both structured data, regular expression concepts, and mapped CUIs improved F measure in both the NMEDW (0.41 vs 0.79) and VUMC (0.62 vs 0.96) datasets compared to the baseline lupus nephritis algorithm.

LGDec 15, 2021
Disparities in Social Determinants among Performances of Mortality Prediction with Machine Learning for Sepsis Patients

Hanyin Wang, Yikuan Li, Andrew Naidech et al.

Background Sepsis is one of the most life-threatening circumstances for critically ill patients in the US, while a standardized criteria for sepsis identification is still under development. Disparities in social determinants of sepsis patients can interfere with the risk prediction performances using machine learning. Methods Disparities in social determinants, including race, gender, marital status, insurance types and languages, among patients identified by six available sepsis criteria were revealed by forest plots. Sixteen machine learning classifiers were trained to predict in-hospital mortality for sepsis patients. The performance of the trained model was tested on the entire randomly conducted test set and each sub-population built based on each of the following social determinants: race, gender, marital status, insurance type, and language. Results We analyzed a total of 11,791 critical care patients from the MIMIC-III database. Within the population identified by each sepsis identification method, significant differences were observed among sub-populations regarding race, marital status, insurance type, and language. On the 5,783 sepsis patients identified by the Sepsis-3 criteria statistically significant performance decreases for mortality prediction were observed when applying the trained machine learning model on Asian and Hispanic patients. With pairwise comparison, we detected performance discrepancies in mortality prediction between Asian and White patients, Asians and patients of other races, as well as English-speaking and Spanish-speaking patients. Conclusions Disparities in proportions of patients identified by various sepsis criteria were detected among the different social determinant groups. To achieve accurate diagnosis, a versatile diagnostic system for sepsis is needed to overcome the social determinant disparities of patients.

QMNov 19, 2021
SNPs Filtered by Allele Frequency Improve the Prediction of Hypertension Subtypes

Yiming Li, Sanjiv J. Shah, Donna Arnett et al.

Hypertension is the leading global cause of cardiovascular disease and premature death. Distinct hypertension subtypes may vary in their prognoses and require different treatments. An individual's risk for hypertension is determined by genetic and environmental factors as well as their interactions. In this work, we studied 911 African Americans and 1,171 European Americans in the Hypertension Genetic Epidemiology Network (HyperGEN) cohort. We built hypertension subtype classification models using both environmental variables and sets of genetic features selected based on different criteria. The fitted prediction models provided insights into the genetic landscape of hypertension subtypes, which may aid personalized diagnosis and treatment of hypertension in the future.

LGNov 18, 2021
Assessing Social Determinants-Related Performance Bias of Machine Learning Models: A case of Hyperchloremia Prediction in ICU Population

Songzi Liu, Yuan Luo

Machine learning in medicine leverages the wealth of healthcare data to extract knowledge, facilitate clinical decision-making, and ultimately improve care delivery. However, ML models trained on datasets that lack demographic diversity could yield suboptimal performance when applied to the underrepresented populations (e.g. ethnic minorities, lower social-economic status), thus perpetuating health disparity. In this study, we evaluated four classifiers built to predict Hyperchloremia - a condition that often results from aggressive fluids administration in the ICU population - and compared their performance in racial, gender, and insurance subgroups. We observed that adding social determinants features in addition to the lab-based ones improved model performance on all patients. The subgroup testing yielded significantly different AUC scores in 40 out of the 44 model-subgroup, suggesting disparities when applying ML models to social determinants subgroups. We urge future researchers to design models that proactively adjust for potential biases and include subgroup reporting in their studies.

CYNov 9, 2021
Early Prediction of Mortality in Critical Care Setting in Sepsis Patients Using Structured Features and Unstructured Clinical Notes

Jiyoung Shin, Yikuan Li, Yuan Luo

Sepsis is an important cause of mortality, especially in intensive care unit (ICU) patients. Developing novel methods to identify early mortality is critical for improving survival outcomes in sepsis patients. Using the MIMIC-III database, we integrated demographic data, physiological measurements and clinical notes. We built and applied several machine learning models to predict the risk of hospital mortality and 30-day mortality in sepsis patients. From the clinical notes, we generated clinically meaningful word representations and embeddings. Supervised learning classifiers and a deep learning architecture were used to construct prediction models. The configurations that utilized both structured and unstructured clinical features yielded competitive F-measure of 0.512. Our results showed that the approaches integrating both structured and unstructured clinical features can be effectively applied to assist clinicians in identifying the risk of mortality in sepsis patients upon admission to the ICU.

LGOct 31, 2021
Unsupervised Learning to Subphenotype Delirium Patients from Electronic Health Records

Yiqing Zhao, Yuan Luo

Delirium is a common acute onset brain dysfunction in the emergency setting and is associated with higher mortality. It is difficult to detect and monitor since its presentations and risk factors can be different depending on the underlying medical condition of patients. In our study, we aimed to identify subtypes within the delirium population and build subgroup-specific predictive models to detect delirium using Medical Information Mart for Intensive Care IV (MIMIC-IV) data. We showed that clusters exist within the delirium population. Differences in feature importance were also observed for subgroup-specific predictive models. Our work could recalibrate existing delirium prediction models for each delirium subgroup and improve the precision of delirium detection and monitoring for ICU or emergency department patients who had highly heterogeneous medical conditions.

LGAug 17, 2021
Aggregation Delayed Federated Learning

Ye Xue, Diego Klabjan, Yuan Luo

Federated learning is a distributed machine learning paradigm where multiple data owners (clients) collaboratively train one machine learning model while keeping data on their own devices. The heterogeneity of client datasets is one of the most important challenges of federated learning algorithms. Studies have found performance reduction with standard federated algorithms, such as FedAvg, on non-IID data. Many existing works on handling non-IID data adopt the same aggregation framework as FedAvg and focus on improving model updates either on the server side or on clients. In this work, we tackle this challenge in a different view by introducing redistribution rounds that delay the aggregation. We perform experiments on multiple tasks and show that the proposed framework significantly improves the performance on non-IID data.

LGFeb 22, 2021
Non-Convex Optimization with Spectral Radius Regularization

Adam Sandler, Diego Klabjan, Yuan Luo

We develop regularization methods to find flat minima while training deep neural networks. These minima generalize better than sharp minima, yielding models outperforming baselines on real-world test data (which may be distributed differently than the training data). Specifically, we propose a method of regularized optimization to reduce the spectral radius of the Hessian of the loss function. We also derive algorithms to efficiently optimize neural network models and prove that these algorithms almost surely converge. Furthermore, we demonstrate that our algorithm works effectively on applications in different domains, including healthcare. To show that our models generalize well, we introduced various methods for testing generalizability and found that our models outperform comparable baseline models on these tests.

QMDec 15, 2020
PANTHER: Pathway Augmented Nonnegative Tensor factorization for HighER-order feature learning

Yuan Luo, Chengsheng Mao

Genetic pathways usually encode molecular mechanisms that can inform targeted interventions. It is often challenging for existing machine learning approaches to jointly model genetic pathways (higher-order features) and variants (atomic features), and present to clinicians interpretable models. In order to build more accurate and better interpretable machine learning models for genetic medicine, we introduce Pathway Augmented Nonnegative Tensor factorization for HighER-order feature learning (PANTHER). PANTHER selects informative genetic pathways that directly encode molecular mechanisms. We apply genetically motivated constrained tensor factorization to group pathways in a way that reflects molecular mechanism interactions. We then train a softmax classifier for disease types using the identified pathway groups. We evaluated PANTHER against multiple state-of-the-art constrained tensor/matrix factorization models, as well as group guided and Bayesian hierarchical models. PANTHER outperforms all state-of-the-art comparison models significantly (p<0.05). Our experiments on large scale Next Generation Sequencing (NGS) and whole-genome genotyping datasets also demonstrated wide applicability of PANTHER. We performed feature analysis in predicting disease types, which suggested insights and benefits of the identified pathway groups.

LGOct 12, 2020
Towards Expressive Graph Representation

Chengsheng Mao, Liang Yao, Yuan Luo

Graph Neural Network (GNN) aggregates the neighborhood of each node into the node embedding and shows its powerful capability for graph representation learning. However, most existing GNN variants aggregate the neighborhood information in a fixed non-injective fashion, which may map different graphs or nodes to the same embedding, reducing the model expressiveness. We present a theoretical framework to design a continuous injective set function for neighborhood aggregation in GNN. Using the framework, we propose expressive GNN that aggregates the neighborhood of each node with a continuous injective set function, so that a GNN layer maps similar nodes with similar neighborhoods to similar embeddings, different nodes to different embeddings and the equivalent nodes or isomorphic graphs to the same embeddings. Moreover, the proposed expressive GNN can naturally learn expressive representations for graphs with continuous node attributes. We validate the proposed expressive GNN (ExpGNN) for graph classification on multiple benchmark datasets including simple graphs and attributed graphs. The experimental results demonstrate that our model achieves state-of-the-art performances on most of the benchmarks.