Hassan Keshvarikhojasteh

h-index1

5papers

26citations

Novelty46%

AI Score43

Ranked #55,174 of 194,257 authors (top 28%)#19,145 in CV (top 32%)

5 Papers

7.4QMMay 12Code

Attention-Based Multimodal Survival Prediction with Cross-Modal Bilinear Fusion

Hassan Keshvarikhojasteh, Josien P. W. Pluim, Mitko Veta

We propose a novel multimodal deep learning framework for patient-level survival prediction, which integrates whole-slide histology features, RNA-seq expression profiles, and clinical variables. Our architecture combines an ABMIL module~\cite{ilse2018attention} for slide-level representation with feedforward encoders for RNA and clinical data. These embeddings are then integrated through low-rank bilinear cross-modal fusion~\cite{liu2018efficient} to model conditional interactions across modalities while controlling parameter growth. The model outputs continuous risk scores that are subsequently mapped to survival times using a nonparametric calibration procedure based on the Kaplan--Meier estimator~\cite{kaplan1958nonparametric}. By decomposing multimodal reasoning into independent pairwise interactions, the proposed fusion design promotes structural interpretability and parameter efficiency compared with full tensor and hierarchical fusion strategies. Experiments on the CHIMERA challenge dataset demonstrate improved predictive performance over concatenation-based baselines and competitive generalization on hidden evaluation cohorts. These results indicate that the proposed framework is a promising approach for multimodal survival prediction in HR-NMIBC. The implementation is publicly available at https://github.com/hassancpu/ChimeraChallenge2025_Task_3.

10.5CVApr 8, 2024Code

Multi-head Attention-based Deep Multiple Instance Learning

Hassan Keshvarikhojasteh, Josien Pluim, Mitko Veta

This paper introduces MAD-MIL, a Multi-head Attention-based Deep Multiple Instance Learning model, designed for weakly supervised Whole Slide Images (WSIs) classification in digital pathology. Inspired by the multi-head attention mechanism of the Transformer, MAD-MIL simplifies model complexity while achieving competitive results against advanced models like CLAM and DS-MIL. Evaluated on the MNIST-BAGS and public datasets, including TUPAC16, TCGA BRCA, TCGA LUNG, and TCGA KIDNEY, MAD-MIL consistently outperforms ABMIL. This demonstrates enhanced information diversity, interpretability, and efficiency in slide representation. The model's effectiveness, coupled with fewer trainable parameters and lower computational complexity makes it a promising solution for automated pathology workflows. Our code is available at https://github.com/tueimage/MAD-MIL.

5.1IVApr 24, 2025Code

A Spatially-Aware Multiple Instance Learning Framework for Digital Pathology

Hassan Keshvarikhojasteh, Mihail Tifrea, Sibylle Hess et al.

Multiple instance learning (MIL) is a promising approach for weakly supervised classification in pathology using whole slide images (WSIs). However, conventional MIL methods such as Attention-Based Deep Multiple Instance Learning (ABMIL) typically disregard spatial interactions among patches that are crucial to pathological diagnosis. Recent advancements, such as Transformer based MIL (TransMIL), have incorporated spatial context and inter-patch relationships. However, it remains unclear whether explicitly modeling patch relationships yields similar performance gains in ABMIL, which relies solely on Multi-Layer Perceptrons (MLPs). In contrast, TransMIL employs Transformer-based layers, introducing a fundamental architectural shift at the cost of substantially increased computational complexity. In this work, we enhance the ABMIL framework by integrating interaction-aware representations to address this question. Our proposed model, Global ABMIL (GABMIL), explicitly captures inter-instance dependencies while preserving computational efficiency. Experimental results on two publicly available datasets for tumor subtyping in breast and lung cancers demonstrate that GABMIL achieves up to a 7 percentage point improvement in AUPRC and a 5 percentage point increase in the Kappa score over ABMIL, with minimal or no additional computational overhead. These findings underscore the importance of incorporating patch interactions within MIL frameworks. Our code is available at \href{https://github.com/tueimage/GABMIL}{\texttt{GABMIL}}.

6.5CVMar 8, 2024

Multiple Instance Learning with random sampling for Whole Slide Image Classification

H. Keshvarikhojasteh, J. P. W. Pluim, M. Veta

In computational pathology, random sampling of patches during training of Multiple Instance Learning (MIL) methods is computationally efficient and serves as a regularization strategy. Despite its promising benefits, questions concerning performance trends for varying sample sizes and its influence on model interpretability remain. Addressing these, we reach an optimal performance enhancement of 1.7% using thirty percent of patches on the CAMELYON16 dataset, and 3.7% with only eight samples on the TUPAC16 dataset. We also find interpretability effects are strongly dataset-dependent, with interpretability impacted on CAMELYON16, while remaining unaffected on TUPAC16. This reinforces that both the performance and interpretability relationships with sampling are closely task-specific. End-to-end training with 1024 samples reveals improvements across both datasets compared to pre-extracted features, further highlighting the potential of this efficient approach.

2.6CVAug 7, 2021

Temporal Action Localization Using Gated Recurrent Units

Hassan Keshvarikhojasteh, Hoda Mohammadzade, Hamid Behroozi

Temporal Action Localization (TAL) task which is to predict the start and end of each action in a video along with the class label of the action has numerous applications in the real world. But due to the complexity of this task, acceptable accuracy rates have not been achieved yet, whereas this is not the case regarding the action recognition task. In this paper, we propose a new network based on Gated Recurrent Unit (GRU) and two novel post-processing methods for TAL task. Specifically, we propose a new design for the output layer of the conventionally GRU resulting in the so-called GRU-Split network. Moreover, linear interpolation is used to generate the action proposals with precise start and end times. Finally, to rank the generated proposals appropriately, we use a Learn to Rank (LTR) approach. We evaluated the performance of the proposed method on Thumos14 and ActivityNet-1.3 datasets. Results show the superiority of the performance of the proposed method compared to state-of-the-art. Specifically in the mean Average Precision (mAP) metric at Intersection over Union (IoU) of 0.7 on Thumos14, we get 27.52% accuracy which is 5.12% better than that of state-of-the-art methods.