Hao-Ren Yao

LG
h-index50
8papers
1,018citations
Novelty49%
AI Score42

8 Papers

CLSep 1, 2022
Distilling Multi-Scale Knowledge for Event Temporal Relation Extraction

Hao-Ren Yao, Luke Breitfeller, Aakanksha Naik et al. · cmu

Event Temporal Relation Extraction (ETRE) is paramount but challenging. Within a discourse, event pairs are situated at different distances or the so-called proximity bands. The temporal ordering communicated about event pairs where at more remote (i.e., ``long'') or less remote (i.e., ``short'') proximity bands are encoded differently. SOTA models have tended to perform well on events situated at either short or long proximity bands, but not both. Nonetheless, real-world, natural texts contain all types of temporal event-pairs. In this paper, we present MulCo: Distilling Multi-Scale Knowledge via Contrastive Learning, a knowledge co-distillation approach that shares knowledge across multiple event pair proximity bands to improve performance on all types of temporal datasets. Our experimental results show that MulCo successfully integrates linguistic cues pertaining to temporal reasoning across both short and long proximity bands and achieves new state-of-the-art results on several ETRE benchmark datasets.

LGSep 1, 2022
Self-supervised Representation Learning on Electronic Health Records with Graph Kernel Infomax

Hao-Ren Yao, Nairen Cao, Katina Russell et al.

Learning Electronic Health Records (EHRs) representation is a preeminent yet under-discovered research topic. It benefits various clinical decision support applications, e.g., medication outcome prediction or patient similarity search. Current approaches focus on task-specific label supervision on vectorized sequential EHR, which is not applicable to large-scale unsupervised scenarios. Recently, contrastive learning shows great success on self-supervised representation learning problems. However, complex temporality often degrades the performance. We propose Graph Kernel Infomax, a self-supervised graph kernel learning approach on the graphical representation of EHR, to overcome the previous problems. Unlike the state-of-the-art, we do not change the graph structure to construct augmented views. Instead, we use Kernel Subspace Augmentation to embed nodes into two geometrically different manifold views. The entire framework is trained by contrasting nodes and graph representations on those two manifold views through the commonly used contrastive objectives. Empirically, using publicly available benchmark EHR datasets, our approach yields performance on clinical downstream tasks that exceeds the state-of-the-art. Theoretically, the variation on distance metrics naturally creates different views as data augmentation without changing graph structures.

LGMay 30, 2025Code
Intercept Cancer: Cancer Pre-Screening with Large Scale Healthcare Foundation Models

Liwen Sun, Hao-Ren Yao, Gary Gao et al.

Cancer screening, leading to early detection, saves lives. Unfortunately, existing screening techniques require expensive and intrusive medical procedures, not globally available, resulting in too many lost would-be-saved lives. We present CATCH-FM, CATch Cancer early with Healthcare Foundation Models, a cancer pre-screening methodology that identifies high-risk patients for further screening solely based on their historical medical records. With millions of electronic healthcare records (EHR), we establish the scaling law of EHR foundation models pretrained on medical code sequences, pretrain compute-optimal foundation models of up to 2.4 billion parameters, and finetune them on clinician-curated cancer risk prediction cohorts. In our retrospective evaluation comprising of thirty thousand patients, CATCH-FM achieves strong efficacy, with 50% sensitivity in predicting first cancer risks at 99% specificity cutoff, and outperforming feature-based tree models and both general and medical LLMs by up to 20% AUPRC. Despite significant demographic, healthcare system, and EHR coding differences, CATCH-FM achieves state-of-the-art pancreatic cancer risk prediction on the EHRSHOT few-shot leaderboard, outperforming EHR foundation models pretrained using on-site patient data. Our analysis demonstrates the robustness of CATCH-FM in various patient distributions, the benefits of operating in the ICD code space, and its ability to capture non-trivial cancer risk factors. Our code will be open-sourced.

AIJan 30
Task-Aware LLM Council with Adaptive Decision Pathways for Decision Support

Wei Zhu, Lixing Yu, Hao-Ren Yao et al.

Large language models (LLMs) have shown strong capabilities across diverse decision-making tasks. However, existing approaches often overlook the specialization differences among available models, treating all LLMs as uniformly applicable regardless of task characteristics. This limits their ability to adapt to varying reasoning demands and task complexities. In this work, we propose Task-Aware LLM Council (TALC), a task-adaptive decision framework that integrates a council of LLMs with Monte Carlo Tree Search (MCTS) to enable dynamic expert selection and efficient multi-step planning. Each LLM is equipped with a structured success memory profile derived from prior task trajectories, enabling semantic matching between current reasoning context and past successes. At each decision point, TALC routes control to the most contextually appropriate model and estimates node value using a dual-signal mechanism that fuses model-based evaluations with historical utility scores. These signals are adaptively weighted based on intra-node variance and used to guide MCTS selection, allowing the system to balance exploration depth with planning confidence. Experiments on WebShop, HumanEval, and the Game of 24 demonstrate that TALC achieves superior task success rates and improved search efficiency compared to strong baselines, validating the benefits of specialization-aware routing and adaptive planning.

MEFeb 13, 2024
Perturbative partial moment matching and gradient-flow adaptive importance sampling transformations for Bayesian leave one out cross-validation

Joshua C Chang, Xiangting Li, Shixin Xu et al.

Importance sampling (IS) allows one to approximate leave one out (LOO) cross-validation for a Bayesian model, without refitting, by inverting the Bayesian update equation to subtract a given data point from a model posterior. For each data point, one computes expectations under the corresponding LOO posterior by weighted averaging over the full data posterior. This task sometimes requires weight stabilization in the form of adapting the posterior distribution via transformation. So long as one is successful in finding a suitable transformation, one avoids refitting. To this end, we motivate the use of bijective perturbative transformations of the form $T(\boldsymbolθ)=\boldsymbolθ + h Q(\boldsymbolθ),$ for $0<h\ll 1,$ and introduce two classes of such transformations: 1) partial moment matching and 2) gradient flow evolution. The former extends prior literature on moment-matching under the recognition that adaptation for LOO is a small perturbation on the full data posterior. The latter class of methods define transformations based on relaxing various statistical objectives: in our case the variance of the IS estimator and the KL divergence between the transformed distribution and the statistics of the LOO fold. Being model-specific, the gradient flow transformations require evaluating Jacobian determinants. While these quantities are generally readily available through auto-differentiation, we derive closed-form expressions in the case of logistic regression and shallow ReLU activated neural networks. We tested the methodology on an $n\ll p$ dataset that is known to produce unstable LOO IS weights.

LGFeb 4, 2021
The Analysis from Nonlinear Distance Metric to Kernel-based Drug Prescription Prediction System

Der-Chen Chang, Ophir Frieder, Chi-Feng Hung et al.

Distance metrics and their nonlinear variant play a crucial role in machine learning based real-world problem solving. We demonstrated how Euclidean and cosine distance measures differ not only theoretically but also in real-world medical application, namely, outcome prediction of drug prescription. Euclidean distance exhibits favorable properties in the local geometry problem. To this regard, Euclidean distance can be applied under short-term disease with low-variation outcome observation. Moreover, when presenting to highly variant chronic disease, it is preferable to use cosine distance. These different geometric properties lead to different submanifolds in the original embedded space, and hence, to different optimizing nonlinear kernel embedding frameworks. We first established the geometric properties that we needed in these frameworks. From these properties interpreted their differences in certain perspectives. Our evaluation on real-world, large-scale electronic health records and embedding space visualization empirically validated our approach.

LGAug 4, 2020
Cross-Global Attention Graph Kernel Network Prediction of Drug Prescription

Hao-Ren Yao, Der-Chen Chang, Ophir Frieder et al.

We present an end-to-end, interpretable, deep-learning architecture to learn a graph kernel that predicts the outcome of chronic disease drug prescription. This is achieved through a deep metric learning collaborative with a Support Vector Machine objective using a graphical representation of Electronic Health Records. We formulate the predictive model as a binary graph classification problem with an adaptive learned graph kernel through novel cross-global attention node matching between patient graphs, simultaneously computing on multiple graphs without training pair or triplet generation. Results using the Taiwanese National Health Insurance Research Database demonstrate that our approach outperforms current start-of-the-art models both in terms of accuracy and interpretability.

CLJul 28, 2020
GUIR at SemEval-2020 Task 12: Domain-Tuned Contextualized Models for Offensive Language Detection

Sajad Sotudeh, Tong Xiang, Hao-Ren Yao et al.

Offensive language detection is an important and challenging task in natural language processing. We present our submissions to the OffensEval 2020 shared task, which includes three English sub-tasks: identifying the presence of offensive language (Sub-task A), identifying the presence of target in offensive language (Sub-task B), and identifying the categories of the target (Sub-task C). Our experiments explore using a domain-tuned contextualized language model (namely, BERT) for this task. We also experiment with different components and configurations (e.g., a multi-view SVM) stacked upon BERT models for specific sub-tasks. Our submissions achieve F1 scores of 91.7% in Sub-task A, 66.5% in Sub-task B, and 63.2% in Sub-task C. We perform an ablation study which reveals that domain tuning considerably improves the classification performance. Furthermore, error analysis shows common misclassification errors made by our model and outlines research directions for future.