Yang Dai

7.8CVMar 10

VIVID-Med: LLM-Supervised Structured Pretraining for Deployable Medical ViTs

Xiyao Wang, Xiaoyu Tan, Yang Dai et al.

Vision-language pretraining has driven significant progress in medical image analysis. However, current methods typically supervise visual encoders using one-hot labels or free-form text, neither of which effectively captures the complex semantic relationships among clinical findings. In this study, we introduce VIVID-Med, a novel framework that leverages a frozen large language model (LLM) as a structured semantic teacher to pretrain medical vision transformers (ViTs). VIVID-Med translates clinical findings into verifiable JSON field-state pairs via a Unified Medical Schema (UMS), utilizing answerability-aware masking to focus optimization. It then employs Structured Prediction Decomposition (SPD) to partition cross-attention into orthogonality-regularized query groups, extracting complementary visual aspects. Crucially, the LLM is discarded post-training, yielding a lightweight, deployable ViT-only backbone. We evaluated VIVID-Med across multiple settings: on CheXpert linear probing, it achieves a macro-AUC of 0.8588, outperforming BiomedCLIP by +6.65 points while using 500x less data. It also demonstrates robust zero-shot cross-domain transfer to NIH ChestX-ray14 (0.7225 macro-AUC) and strong cross-modality generalization to CT, achieving 0.8413 AUC on LIDC-IDRI lung nodule classification and 0.9969 macro-AUC on OrganAMNIST 11-organ classification. VIVID-Med offers a highly efficient, scalable alternative to deploying resource-heavy vision-language models in clinical settings.

1.2TOJan 20, 2025Code

Prediction of Lung Metastasis from Hepatocellular Carcinoma using the SEER Database

Jeff J. H. Kim, George R. Nahass, Yang Dai et al.

Hepatocellular carcinoma (HCC) is a leading cause of cancer-related mortality, with lung metastases being the most common site of distant spread and significantly worsening prognosis. Despite the growing availability of clinical and demographic data, predictive models for lung metastasis in HCC remain limited in scope and clinical applicability. In this study, we develop and validate an end-to-end machine learning pipeline using data from the Surveillance, Epidemiology, and End Results (SEER) database. We evaluated three machine learning models (Random Forest, XGBoost, and Logistic Regression) alongside a multilayer perceptron (MLP) neural network. Our models achieved high AUROC values and recall, with the Random Forest and MLP models demonstrating the best overall performance (AUROC = 0.82). However, the low precision across models highlights the challenges of accurately predicting positive cases. To address these limitations, we developed a custom loss function incorporating recall optimization, enabling the MLP model to achieve the highest sensitivity. An ensemble approach further improved overall recall by leveraging the strengths of individual models. Feature importance analysis revealed key predictors such as surgery status, tumor staging, and follow up duration, emphasizing the relevance of clinical interventions and disease progression in metastasis prediction. While this study demonstrates the potential of machine learning for identifying high-risk patients, limitations include reliance on imbalanced datasets, incomplete feature annotations, and the low precision of predictions. Future work should leverage the expanding SEER dataset, improve data imputation techniques, and explore advanced pre-trained models to enhance predictive accuracy and clinical utility.

Yang Dai

2 Papers