CLMar 26, 2025
ProtoBERT-LoRA: Parameter-Efficient Prototypical Finetuning for Immunotherapy Study IdentificationShijia Zhang, Xiyu Ding, Kai Ding et al.
Identifying immune checkpoint inhibitor (ICI) studies in genomic repositories like Gene Expression Omnibus (GEO) is vital for cancer research yet remains challenging due to semantic ambiguity, extreme class imbalance, and limited labeled data in low-resource settings. We present ProtoBERT-LoRA, a hybrid framework that combines PubMedBERT with prototypical networks and Low-Rank Adaptation (LoRA) for efficient fine-tuning. The model enforces class-separable embeddings via episodic prototype training while preserving biomedical domain knowledge. Our dataset was divided as: Training (20 positive, 20 negative), Prototype Set (10 positive, 10 negative), Validation (20 positive, 200 negative), and Test (71 positive, 765 negative). Evaluated on test dataset, ProtoBERT-LoRA achieved F1-score of 0.624 (precision: 0.481, recall: 0.887), outperforming the rule-based system, machine learning baselines and finetuned PubMedBERT. Application to 44,287 unlabeled studies reduced manual review efforts by 82%. Ablation studies confirmed that combining prototypes with LoRA improved performance by 29% over stand-alone LoRA.
IVMar 1, 2025
Cross-Attention Fusion of MRI and Jacobian Maps for Alzheimer's Disease DiagnosisShijia Zhang, Xiyu Ding, Brian Caffo et al.
Early diagnosis of Alzheimer's disease (AD) is critical for intervention before irreversible neurodegeneration occurs. Structural MRI (sMRI) is widely used for AD diagnosis, but conventional deep learning approaches primarily rely on intensity-based features, which require large datasets to capture subtle structural changes. Jacobian determinant maps (JSM) provide complementary information by encoding localized brain deformations, yet existing multimodal fusion strategies fail to fully integrate these features with sMRI. We propose a cross-attention fusion framework to model the intrinsic relationship between sMRI intensity and JSM-derived deformations for AD classification. Using the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset, we compare cross-attention, pairwise self-attention, and bottleneck attention with four pre-trained 3D image encoders. Cross-attention fusion achieves superior performance, with mean ROC-AUC scores of 0.903 (+/-0.033) for AD vs. cognitively normal (CN) and 0.692 (+/-0.061) for mild cognitive impairment (MCI) vs. CN. Despite its strong performance, our model remains highly efficient, with only 1.56 million parameters--over 40 times fewer than ResNet-34 (63M) and Swin UNETR (61.98M). These findings demonstrate the potential of cross-attention fusion for improving AD diagnosis while maintaining computational efficiency.
IRNov 25, 2019
FLATM: A Fuzzy Logic Approach Topic Model for Medical DocumentsAmir Karami, Aryya Gangopadhyay, Bin Zhou et al.
One of the challenges for text analysis in medical domains is analyzing large-scale medical documents. As a consequence, finding relevant documents has become more difficult. One of the popular methods to retrieve information based on discovering the themes in the documents is topic modeling. The themes in the documents help to retrieve documents on the same topic with and without a query. In this paper, we present a novel approach to topic modeling using fuzzy clustering. To evaluate our model, we experiment with two text datasets of medical documents. The evaluation metrics carried out through document classification and document modeling show that our model produces better performance than LDA, indicating that fuzzy set theory can improve the performance of topic models in medical domains.
MLMay 2, 2017
Fuzzy Approach Topic Discovery in Health and Medical CorporaAmir Karami, Aryya Gangopadhyay, Bin Zhou et al.
The majority of medical documents and electronic health records (EHRs) are in text format that poses a challenge for data processing and finding relevant documents. Looking for ways to automatically retrieve the enormous amount of health and medical knowledge has always been an intriguing topic. Powerful methods have been developed in recent years to make the text processing automatic. One of the popular approaches to retrieve information based on discovering the themes in health & medical corpora is topic modeling, however, this approach still needs new perspectives. In this research we describe fuzzy latent semantic analysis (FLSA), a novel approach in topic modeling using fuzzy perspective. FLSA can handle health & medical corpora redundancy issue and provides a new method to estimate the number of topics. The quantitative evaluations show that FLSA produces superior performance and features to latent Dirichlet allocation (LDA), the most popular topic model.