Christopher C. Yang

LG
h-index7
13papers
62citations
Novelty51%
AI Score47

13 Papers

CLJun 23, 2022
Constructing Cross-lingual Consumer Health Vocabulary with Word-Embedding from Comparable User Generated Content

Chia-Hsuan Chang, Lei Wang, Christopher C. Yang

The online health community (OHC) is the primary channel for laypeople to share health information. To analyze the health consumer-generated content (HCGC) from the OHCs, identifying the colloquial medical expressions used by laypeople is a critical challenge. The open-access and collaborative consumer health vocabulary (OAC CHV) is the controlled vocabulary for addressing such a challenge. Nevertheless, OAC CHV is only available in English, limiting its applicability to other languages. This research proposes a cross-lingual automatic term recognition framework for extending the English CHV into a cross-lingual one. Our framework requires an English HCGC corpus and a non-English (i.e., Chinese in this study) HCGC corpus as inputs. Two monolingual word vector spaces are determined using the skip-gram algorithm so that each space encodes common word associations from laypeople within a language. Based on the isometry assumption, the framework aligns two monolingual spaces into a bilingual word vector space, where we employ cosine similarity as a metric for identifying semantically similar words across languages. The experimental results demonstrate that our framework outperforms the other two large language models in identifying CHV across languages. Our framework only requires raw HCGC corpora and a limited size of medical translations, reducing human efforts in compiling cross-lingual CHV.

AINov 2, 2025Code
Knowledge Elicitation with Large Language Models for Interpretable Cancer Stage Identification from Pathology Reports

Yeawon Lee, Christopher C. Yang, Chia-Hsuan Chang et al.

Cancer staging is critical for patient prognosis and treatment planning, yet extracting pathologic TNM staging from unstructured pathology reports poses a persistent challenge. Existing natural language processing (NLP) and machine learning (ML) strategies often depend on large annotated datasets, limiting their scalability and adaptability. In this study, we introduce two Knowledge Elicitation methods designed to overcome these limitations by enabling large language models (LLMs) to induce and apply domain-specific rules for cancer staging. The first, Knowledge Elicitation with Long-Term Memory (KEwLTM), uses an iterative prompting strategy to derive staging rules directly from unannotated pathology reports, without requiring ground-truth labels. The second, Knowledge Elicitation with Retrieval-Augmented Generation (KEwRAG), employs a variation of RAG where rules are pre-extracted from relevant guidelines in a single step and then applied, enhancing interpretability and avoiding repeated retrieval overhead. We leverage the ability of LLMs to apply broad knowledge learned during pre-training to new tasks. Using breast cancer pathology reports from the TCGA dataset, we evaluate their performance in identifying T and N stages, comparing them against various baseline approaches on two open-source LLMs. Our results indicate that KEwLTM outperforms KEwRAG when Zero-Shot Chain-of-Thought (ZSCOT) inference is effective, whereas KEwRAG achieves better performance when ZSCOT inference is less effective. Both methods offer transparent, interpretable interfaces by making the induced rules explicit. These findings highlight the promise of our Knowledge Elicitation methods as scalable, high-performing solutions for automated cancer staging with enhanced interpretability, particularly in clinical settings with limited annotated data.

CLApr 2, 2024Code
Classifying Cancer Stage with Open-Source Clinical Large Language Models

Chia-Hsuan Chang, Mary M. Lucas, Grace Lu-Yao et al.

Cancer stage classification is important for making treatment and care management plans for oncology patients. Information on staging is often included in unstructured form in clinical, pathology, radiology and other free-text reports in the electronic health record system, requiring extensive work to parse and obtain. To facilitate the extraction of this information, previous NLP approaches rely on labeled training datasets, which are labor-intensive to prepare. In this study, we demonstrate that without any labeled training data, open-source clinical large language models (LLMs) can extract pathologic tumor-node-metastasis (pTNM) staging information from real-world pathology reports. Our experiments compare LLMs and a BERT-based model fine-tuned using the labeled data. Our findings suggest that while LLMs still exhibit subpar performance in Tumor (T) classification, with the appropriate adoption of prompting strategies, they can achieve comparable performance on Metastasis (M) classification and improved performance on Node (N) classification.

19.9AIMar 28
MediHive: A Decentralized Agent Collective for Medical Reasoning

Xiaoyang Wang, Christopher C. Yang

Large language models (LLMs) have revolutionized medical reasoning tasks, yet single-agent systems often falter on complex, interdisciplinary problems requiring robust handling of uncertainty and conflicting evidence. Multi-agent systems (MAS) leveraging LLMs enable collaborative intelligence, but prevailing centralized architectures suffer from scalability bottlenecks, single points of failure, and role confusion in resource-constrained environments. Decentralized MAS (D-MAS) promise enhanced autonomy and resilience via peer-to-peer interactions, but their application to high-stakes healthcare domains remains underexplored. We introduce MediHive, a novel decentralized multi-agent framework for medical question answering that integrates a shared memory pool with iterative fusion mechanisms. MediHive deploys LLM-based agents that autonomously self-assign specialized roles, conduct initial analyses, detect divergences through conditional evidence-based debates, and locally fuse peer insights over multiple rounds to achieve consensus. Empirically, MediHive outperforms single-LLM and centralized baselines on MedQA and PubMedQA datasets, attaining accuracies of 84.3% and 78.4%, respectively. Our work advances scalable, fault-tolerant D-MAS for medical AI, addressing key limitations of centralized designs while demonstrating superior performance in reasoning-intensive tasks.

CLApr 19, 2024
Beyond Self-Consistency: Ensemble Reasoning Boosts Consistency and Accuracy of LLMs in Cancer Staging

Chia-Hsuan Chang, Mary M. Lucas, Yeawon Lee et al.

Advances in large language models (LLMs) have encouraged their adoption in the healthcare domain where vital clinical information is often contained in unstructured notes. Cancer staging status is available in clinical reports, but it requires natural language processing to extract the status from the unstructured text. With the advance in clinical-oriented LLMs, it is promising to extract such status without extensive efforts in training the algorithms. Prompting approaches of the pre-trained LLMs that elicit a model's reasoning process, such as chain-of-thought, may help to improve the trustworthiness of the generated responses. Using self-consistency further improves model performance, but often results in inconsistent generations across the multiple reasoning paths. In this study, we propose an ensemble reasoning approach with the aim of improving the consistency of the model generations. Using an open access clinical large language model to determine the pathologic cancer stage from real-world pathology reports, we show that the ensemble reasoning approach is able to improve both the consistency and performance of the LLM in determining cancer stage, thereby demonstrating the potential to use these models in clinical or other domains where reliability and trustworthiness are critical.

LGApr 19, 2024
Explainable AI for Fair Sepsis Mortality Predictive Model

Chia-Hsuan Chang, Xiaoyang Wang, Christopher C. Yang

Artificial intelligence supports healthcare professionals with predictive modeling, greatly transforming clinical decision-making. This study addresses the crucial need for fairness and explainability in AI applications within healthcare to ensure equitable outcomes across diverse patient demographics. By focusing on the predictive modeling of sepsis-related mortality, we propose a method that learns a performance-optimized predictive model and then employs the transfer learning process to produce a model with better fairness. Our method also introduces a novel permutation-based feature importance algorithm aiming at elucidating the contribution of each feature in enhancing fairness on predictions. Unlike existing explainability methods concentrating on explaining feature contribution to predictive performance, our proposed method uniquely bridges the gap in understanding how each feature contributes to fairness. This advancement is pivotal, given sepsis's significant mortality rate and its role in one-third of hospital deaths. Our method not only aids in identifying and mitigating biases within the predictive model but also fosters trust among healthcare stakeholders by improving the transparency and fairness of model predictions, thereby contributing to more equitable and trustworthy healthcare delivery.

LGApr 4, 2024
An ExplainableFair Framework for Prediction of Substance Use Disorder Treatment Completion

Mary M. Lucas, Xiaoyang Wang, Chia-Hsuan Chang et al.

Fairness of machine learning models in healthcare has drawn increasing attention from clinicians, researchers, and even at the highest level of government. On the other hand, the importance of developing and deploying interpretable or explainable models has been demonstrated, and is essential to increasing the trustworthiness and likelihood of adoption of these models. The objective of this study was to develop and implement a framework for addressing both these issues - fairness and explainability. We propose an explainable fairness framework, first developing a model with optimized performance, and then using an in-processing approach to mitigate model biases relative to the sensitive attributes of race and sex. We then explore and visualize explanations of the model changes that lead to the fairness enhancement process through exploring the changes in importance of features. Our resulting-fairness enhanced models retain high sensitivity with improved fairness and explanations of the fairness-enhancement that may provide helpful insights for healthcare providers to guide clinical decision-making and resource allocation.

LGApr 19, 2025
Balancing Fairness and Performance in Healthcare AI: A Gradient Reconciliation Approach

Xiaoyang Wang, Christopher C. Yang

The rapid growth of healthcare data and advances in computational power have accelerated the adoption of artificial intelligence (AI) in medicine. However, AI systems deployed without explicit fairness considerations risk exacerbating existing healthcare disparities, potentially leading to inequitable resource allocation and diagnostic disparities across demographic subgroups. To address this challenge, we propose FairGrad, a novel gradient reconciliation framework that automatically balances predictive performance and multi-attribute fairness optimization in healthcare AI models. Our method resolves conflicting optimization objectives by projecting each gradient vector onto the orthogonal plane of the others, thereby regularizing the optimization trajectory to ensure equitable consideration of all objectives. Evaluated on diverse real-world healthcare datasets and predictive tasks - including Substance Use Disorder (SUD) treatment and sepsis mortality - FairGrad achieved statistically significant improvements in multi-attribute fairness metrics (e.g., equalized odds) while maintaining competitive predictive accuracy. These results demonstrate the viability of harmonizing fairness and utility in mission-critical medical AI applications.

AIAug 29, 2025
Automated Clinical Problem Detection from SOAP Notes using a Collaborative Multi-Agent LLM Architecture

Yeawon Lee, Xiaoyang Wang, Christopher C. Yang

Accurate interpretation of clinical narratives is critical for patient care, but the complexity of these notes makes automation challenging. While Large Language Models (LLMs) show promise, single-model approaches can lack the robustness required for high-stakes clinical tasks. We introduce a collaborative multi-agent system (MAS) that models a clinical consultation team to address this gap. The system is tasked with identifying clinical problems by analyzing only the Subjective (S) and Objective (O) sections of SOAP notes, simulating the diagnostic reasoning process of synthesizing raw data into an assessment. A Manager agent orchestrates a dynamically assigned team of specialist agents who engage in a hierarchical, iterative debate to reach a consensus. We evaluated our MAS against a single-agent baseline on a curated dataset of 420 MIMIC-III notes. The dynamic multi-agent configuration demonstrated consistently improved performance in identifying congestive heart failure, acute kidney injury, and sepsis. Qualitative analysis of the agent debates reveals that this structure effectively surfaces and weighs conflicting evidence, though it can occasionally be susceptible to groupthink. By modeling a clinical team's reasoning process, our system offers a promising path toward more accurate, robust, and interpretable clinical decision support tools.

LGAug 29, 2025
MoE-Health: A Mixture of Experts Framework for Robust Multimodal Healthcare Prediction

Xiaoyang Wang, Christopher C. Yang

Healthcare systems generate diverse multimodal data, including Electronic Health Records (EHR), clinical notes, and medical images. Effectively leveraging this data for clinical prediction is challenging, particularly as real-world samples often present with varied or incomplete modalities. Existing approaches typically require complete modality data or rely on manual selection strategies, limiting their applicability in real-world clinical settings where data availability varies across patients and institutions. To address these limitations, we propose MoE-Health, a novel Mixture of Experts framework designed for robust multimodal fusion in healthcare prediction. MoE-Health architecture is specifically developed to handle samples with differing modalities and improve performance on critical clinical tasks. By leveraging specialized expert networks and a dynamic gating mechanism, our approach dynamically selects and combines relevant experts based on available data modalities, enabling flexible adaptation to varying data availability scenarios. We evaluate MoE-Health on the MIMIC-IV dataset across three critical clinical prediction tasks: in-hospital mortality prediction, long length of stay, and hospital readmission prediction. Experimental results demonstrate that MoE-Health achieves superior performance compared to existing multimodal fusion methods while maintaining robustness across different modality availability patterns. The framework effectively integrates multimodal information, offering improved predictive performance and robustness in handling heterogeneous and incomplete healthcare data, making it particularly suitable for deployment in diverse healthcare environments with heterogeneous data availability.

CLMay 22, 2025
Collaboration among Multiple Large Language Models for Medical Question Answering

Kexin Shang, Chia-Hsuan Chang, Christopher C. Yang

Empowered by vast internal knowledge reservoir, the new generation of large language models (LLMs) demonstrate untapped potential to tackle medical tasks. However, there is insufficient effort made towards summoning up a synergic effect from multiple LLMs' expertise and background. In this study, we propose a multi-LLM collaboration framework tailored on a medical multiple-choice questions dataset. Through post-hoc analysis on 3 pre-trained LLM participants, our framework is proved to boost all LLMs reasoning ability as well as alleviate their divergence among questions. We also measure an LLM's confidence when it confronts with adversary opinions from other LLMs and observe a concurrence between LLM's confidence and prediction accuracy.

LGJan 22, 2025
Enhancing Multi-Attribute Fairness in Healthcare Predictive Modeling

Xiaoyang Wang, Christopher C. Yang

Artificial intelligence (AI) systems in healthcare have demonstrated remarkable potential to improve patient outcomes. However, if not designed with fairness in mind, they also carry the risks of perpetuating or exacerbating existing health disparities. Although numerous fairness-enhancing techniques have been proposed, most focus on a single sensitive attribute and neglect the broader impact that optimizing fairness for one attribute may have on the fairness of other sensitive attributes. In this work, we introduce a novel approach to multi-attribute fairness optimization in healthcare AI, tackling fairness concerns across multiple demographic attributes concurrently. Our method follows a two-phase approach: initially optimizing for predictive performance, followed by fine-tuning to achieve fairness across multiple sensitive attributes. We develop our proposed method using two strategies, sequential and simultaneous. Our results show a significant reduction in Equalized Odds Disparity (EOD) for multiple attributes, while maintaining high predictive accuracy. Notably, we demonstrate that single-attribute fairness methods can inadvertently increase disparities in non-targeted attributes whereas simultaneous multi-attribute optimization achieves more balanced fairness improvements across all attributes. These findings highlight the importance of comprehensive fairness strategies in healthcare AI and offer promising directions for future research in this critical area.

SIMay 12, 2021
Frequent Pattern Mining in Continuous-time Temporal Networks

Ali Jazayeri, Christopher C. Yang

Networks are used as highly expressive tools in different disciplines. In recent years, the analysis and mining of temporal networks have attracted substantial attention. Frequent pattern mining is considered an essential task in the network science literature. In addition to the numerous applications, the investigation of frequent pattern mining in networks directly impacts other analytical approaches, such as clustering, quasi-clique and clique mining, and link prediction. In nearly all the algorithms proposed for frequent pattern mining in temporal networks, the networks are represented as sequences of static networks. Then, the inter- or intra-network patterns are mined. This type of representation imposes a computation-expressiveness trade-off to the mining problem. In this paper, we propose a novel representation that can preserve the temporal aspects of the network losslessly. Then, we introduce the concept of constrained interval graphs (CIGs). Next, we develop a series of algorithms for mining the complete set of frequent temporal patterns in a temporal network data set. We also consider four different definitions of isomorphism to allow noise tolerance in temporal data collection. Implementing the algorithm for three real-world data sets proves the practicality of the proposed algorithm and its capability to discover unknown patterns in various settings.