Salman Ul Hassan Dar

h-index25

15papers

1,168citations

Novelty52%

AI Score47

Ranked #54,120 of 200,471 authors (top 27%)#20,109 in CV (top 34%)

15 Papers

SPMay 23, 2022

BolT: Fused Window Transformers for fMRI Time Series Analysis

Hasan Atakan Bedel, Irmak Şıvgın, Onat Dalmaz et al.

Deep-learning models have enabled performance leaps in analysis of high-dimensional functional MRI (fMRI) data. Yet, many previous methods are suboptimally sensitive for contextual representations across diverse time scales. Here, we present BolT, a blood-oxygen-level-dependent transformer model, for analyzing multi-variate fMRI time series. BolT leverages a cascade of transformer encoders equipped with a novel fused window attention mechanism. Encoding is performed on temporally-overlapped windows within the time series to capture local representations. To integrate information temporally, cross-window attention is computed between base tokens in each window and fringe tokens from neighboring windows. To gradually transition from local to global representations, the extent of window overlap and thereby number of fringe tokens are progressively increased across the cascade. Finally, a novel cross-window regularization is employed to align high-level classification features across the time series. Comprehensive experiments on large-scale public datasets demonstrate the superior performance of BolT against state-of-the-art methods. Furthermore, explanatory analyses to identify landmark time points and regions that contribute most significantly to model decisions corroborate prominent neuroscientific findings in the literature.

CVJul 3, 2023

Investigating Data Memorization in 3D Latent Diffusion Models for Medical Image Synthesis

Salman Ul Hassan Dar, Arman Ghanaat, Jannik Kahmann et al.

Generative latent diffusion models have been established as state-of-the-art in data generation. One promising application is generation of realistic synthetic medical imaging data for open data sharing without compromising patient privacy. Despite the promise, the capacity of such models to memorize sensitive patient training data and synthesize samples showing high resemblance to training data samples is relatively unexplored. Here, we assess the memorization capacity of 3D latent diffusion models on photon-counting coronary computed tomography angiography and knee magnetic resonance imaging datasets. To detect potential memorization of training samples, we utilize self-supervised models based on contrastive learning. Our results suggest that such latent diffusion models indeed memorize training data, and there is a dire need for devising strategies to mitigate memorization.

CVMar 26Code

CardioDiT: Latent Diffusion Transformers for 4D Cardiac MRI Synthesis

Marvin Seyfarth, Sarah Kaye Müller, Arman Ghanaat et al.

Latent diffusion models (LDMs) have recently achieved strong performance in 3D medical image synthesis. However, modalities like cine cardiac MRI (CMR), representing a temporally synchronized 3D volume across the cardiac cycle, add an additional dimension that most generative approaches do not model directly. Instead, they factorize space and time or enforce temporal consistency through auxiliary mechanisms such as anatomical masks. Such strategies introduce structural biases that may limit global context integration and lead to subtle spatiotemporal discontinuities or physiologically inconsistent cardiac dynamics. We investigate whether a unified 4D generative model can learn continuous cardiac dynamics without architectural factorization. We propose CardioDiT, a fully 4D latent diffusion framework for short-axis cine CMR synthesis based on diffusion transformers. A spatiotemporal VQ-VAE encodes 2D+t slices into compact latents, which a diffusion transformer then models jointly as complete 3D+t volumes, coupling space and time throughout the generative process. We evaluate CardioDiT on public CMR datasets and a larger private cohort, comparing it to baselines with progressively stronger spatiotemporal coupling. Results show improved inter-slice consistency, temporally coherent motion, and realistic cardiac function distributions, suggesting that explicit 4D modeling with a diffusion transformer provides a principled foundation for spatiotemporal cardiac image synthesis. Code and models trained on public data are available at https://github.com/Cardio-AI/cardiodit.

CVMar 26Code

VolDiT: Controllable Volumetric Medical Image Synthesis with Diffusion Transformers

Marvin Seyfarth, Salman Ul Hassan Dar, Yannik Frisch et al.

Diffusion models have become a leading approach for high-fidelity medical image synthesis. However, most existing methods for 3D medical image generation rely on convolutional U-Net backbones within latent diffusion frameworks. While effective, these architectures impose strong locality biases and limited receptive fields, which may constrain scalability, global context integration, and flexible conditioning. In this work, we introduce VolDiT, the first purely transformer-based 3D Diffusion Transformer for volumetric medical image synthesis. Our approach extends diffusion transformers to native 3D data through volumetric patch embeddings and global self-attention operating directly over 3D tokens. To enable structured control, we propose a timestep-gated control adapter that maps segmentation masks into learnable control tokens that modulate transformer layers during denoising. This token-level conditioning mechanism allows precise spatial guidance while preserving the modeling advantages of transformer architectures. We evaluate our model on high-resolution 3D medical image synthesis tasks and compare it to state-of-the-art 3D latent diffusion models based on U-Nets. Results demonstrate improved global coherence, superior generative fidelity, and enhanced controllability. Our findings suggest that fully transformerbased diffusion models provide a flexible foundation for volumetric medical image synthesis. The code and models trained on public data are available at https://github.com/Cardio-AI/voldit.

LGJul 20, 2024

Latent Pollution Model: The Hidden Carbon Footprint in 3D Image Synthesis

Marvin Seyfarth, Salman Ul Hassan Dar, Sandy Engelhardt

Contemporary developments in generative AI are rapidly transforming the field of medical AI. These developments have been predominantly driven by the availability of large datasets and high computing power, which have facilitated a significant increase in model capacity. Despite their considerable potential, these models demand substantially high power, leading to high carbon dioxide (CO2) emissions. Given the harm such models are causing to the environment, there has been little focus on the carbon footprints of such models. This study analyzes carbon emissions from 2D and 3D latent diffusion models (LDMs) during training and data generation phases, revealing a surprising finding: the synthesis of large images contributes most significantly to these emissions. We assess different scenarios including model sizes, image dimensions, distributed training, and data generation steps. Our findings reveal substantial carbon emissions from these models, with training 2D and 3D models comparable to driving a car for 10 km and 90 km, respectively. The process of data generation is even more significant, with CO2 emissions equivalent to driving 160 km for 2D models and driving for up to 3345 km for 3D synthesis. Additionally, we found that the location of the experiment can increase carbon emissions by up to 94 times, and even the time of year can influence emissions by up to 50%. These figures are alarming, considering they represent only a single training and data generation phase for each model. Our results emphasize the urgent need for developing environmentally sustainable strategies in generative AI.

IVFeb 1, 2024

Unconditional Latent Diffusion Models Memorize Patient Imaging Data: Implications for Openly Sharing Synthetic Data

Salman Ul Hassan Dar, Marvin Seyfarth, Isabelle Ayx et al.

AI models present a wide range of applications in the field of medicine. However, achieving optimal performance requires access to extensive healthcare data, which is often not readily available. Furthermore, the imperative to preserve patient privacy restricts patient data sharing with third parties and even within institutes. Recently, generative AI models have been gaining traction for facilitating open-data sharing by proposing synthetic data as surrogates of real patient data. Despite the promise, some of these models are susceptible to patient data memorization, where models generate patient data copies instead of novel synthetic samples. Considering the importance of the problem, surprisingly it has received relatively little attention in the medical imaging community. To this end, we assess memorization in unconditional latent diffusion models. We train latent diffusion models on CT, MR, and X-ray datasets for synthetic data generation. We then detect the amount of training data memorized utilizing our novel self-supervised copy detection approach and further investigate various factors that can influence memorization. Our findings show a surprisingly high degree of patient data memorization across all datasets. Comparison with non-diffusion generative models, such as autoencoders and generative adversarial networks, indicates that while latent diffusion models are more susceptible to memorization, overall they outperform non-diffusion models in synthesis quality. Further analyses reveal that using augmentation strategies, small architecture, and increasing dataset can reduce memorization while over-training the models can enhance it. Collectively, our results emphasize the importance of carefully training generative models on private medical imaging datasets, and examining the synthetic data to ensure patient privacy before sharing it for medical research and applications.

CVMar 17, 2025

MedLoRD: A Medical Low-Resource Diffusion Model for High-Resolution 3D CT Image Synthesis

Marvin Seyfarth, Salman Ul Hassan Dar, Isabelle Ayx et al.

Advancements in AI for medical imaging offer significant potential. However, their applications are constrained by the limited availability of data and the reluctance of medical centers to share it due to patient privacy concerns. Generative models present a promising solution by creating synthetic data as a substitute for real patient data. However, medical images are typically high-dimensional, and current state-of-the-art methods are often impractical for computational resource-constrained healthcare environments. These models rely on data sub-sampling, raising doubts about their feasibility and real-world applicability. Furthermore, many of these models are evaluated on quantitative metrics that alone can be misleading in assessing the image quality and clinical meaningfulness of the generated images. To address this, we introduce MedLoRD, a generative diffusion model designed for computational resource-constrained environments. MedLoRD is capable of generating high-dimensional medical volumes with resolutions up to 512$\times$512$\times$256, utilizing GPUs with only 24GB VRAM, which are commonly found in standard desktop workstations. MedLoRD is evaluated across multiple modalities, including Coronary Computed Tomography Angiography and Lung Computed Tomography datasets. Extensive evaluations through radiological evaluation, relative regional volume analysis, adherence to conditional masks, and downstream tasks show that MedLoRD generates high-fidelity images closely adhering to segmentation mask conditions, surpassing the capabilities of current state-of-the-art generative models for medical image synthesis in computational resource-constrained environments.

CVMar 13, 2021

A Few-Shot Learning Approach for Accelerated MRI via Fusion of Data-Driven and Subject-Driven Priors

Salman Ul Hassan Dar, Mahmut Yurt, Tolga Çukur

Deep neural networks (DNNs) have recently found emerging use in accelerated MRI reconstruction. DNNs typically learn data-driven priors from large datasets constituting pairs of undersampled and fully-sampled acquisitions. Acquiring such large datasets, however, might be impractical. To mitigate this limitation, we propose a few-shot learning approach for accelerated MRI that merges subject-driven priors obtained via physical signal models with data-driven priors obtained from a few training samples. Demonstrations on brain MR images from the NYU fastMRI dataset indicate that the proposed approach requires just a few samples to outperform traditional parallel imaging and DNN algorithms.

IVDec 18, 2020

Three Dimensional MR Image Synthesis with Progressive Generative Adversarial Networks

Muzaffer Özbey, Mahmut Yurt, Salman Ul Hassan Dar et al.

Mainstream deep models for three-dimensional MRI synthesis are either cross-sectional or volumetric depending on the input. Cross-sectional models can decrease the model complexity, but they may lead to discontinuity artifacts. On the other hand, volumetric models can alleviate the discontinuity artifacts, but they might suffer from loss of spatial resolution due to increased model complexity coupled with scarce training data. To mitigate the limitations of both approaches, we propose a novel model that progressively recovers the target volume via simpler synthesis tasks across individual orientations.

IVNov 29, 2020

Semi-Supervised Learning of Mutually Accelerated MRI Synthesis without Fully-Sampled Ground Truths

Mahmut Yurt, Salman Ul Hassan Dar, Muzaffer Özbey et al.

Learning-based synthetic multi-contrast MRI commonly involves deep models trained using high-quality images of source and target contrasts, regardless of whether source and target domain samples are paired or unpaired. This results in undesirable reliance on fully-sampled acquisitions of all MRI contrasts, which might prove impractical due to limitations on scan costs and time. Here, we propose a novel semi-supervised deep generative model that instead learns to recover high-quality target images directly from accelerated acquisitions of source and target contrasts. To achieve this, the proposed model introduces novel multi-coil tensor losses in image, k-space and adversarial domains. These selective losses are based only on acquired k-space samples, and randomized sampling masks are used across subjects to capture relationships among acquired and non-acquired k-space regions. Comprehensive experiments on multi-contrast neuroimaging datasets demonstrate that our semi-supervised approach yields equivalent performance to gold-standard fully-supervised models, while outperforming a cascaded approach that learns to synthesize based on reconstructions of undersampled data. Therefore, the proposed approach holds great promise to improve the feasibility and utility of accelerated MRI acquisitions mutually undersampled across both contrast sets and k-space.

CVNov 27, 2020

Progressively Volumetrized Deep Generative Models for Data-Efficient Contextual Learning of MR Image Recovery

Mahmut Yurt, Muzaffer Özbey, Salman Ul Hassan Dar et al.

Magnetic resonance imaging (MRI) offers the flexibility to image a given anatomic volume under a multitude of tissue contrasts. Yet, scan time considerations put stringent limits on the quality and diversity of MRI data. The gold-standard approach to alleviate this limitation is to recover high-quality images from data undersampled across various dimensions, most commonly the Fourier domain or contrast sets. A primary distinction among recovery methods is whether the anatomy is processed per volume or per cross-section. Volumetric models offer enhanced capture of global contextual information, but they can suffer from suboptimal learning due to elevated model complexity. Cross-sectional models with lower complexity offer improved learning behavior, yet they ignore contextual information across the longitudinal dimension of the volume. Here, we introduce a novel progressive volumetrization strategy for generative models (ProvoGAN) that serially decomposes complex volumetric image recovery tasks into successive cross-sectional mappings task-optimally ordered across individual rectilinear dimensions. ProvoGAN effectively captures global context and recovers fine-structural details across all dimensions, while maintaining low model complexity and improved learning behaviour. Comprehensive demonstrations on mainstream MRI reconstruction and synthesis tasks show that ProvoGAN yields superior performance to state-of-the-art volumetric and cross-sectional models.

IVSep 25, 2019

mustGAN: Multi-Stream Generative Adversarial Networks for MR Image Synthesis

Mahmut Yurt, Salman Ul Hassan Dar, Aykut Erdem et al.

Multi-contrast MRI protocols increase the level of morphological information available for diagnosis. Yet, the number and quality of contrasts is limited in practice by various factors including scan time and patient motion. Synthesis of missing or corrupted contrasts can alleviate this limitation to improve clinical utility. Common approaches for multi-contrast MRI involve either one-to-one and many-to-one synthesis methods. One-to-one methods take as input a single source contrast, and they learn a latent representation sensitive to unique features of the source. Meanwhile, many-to-one methods receive multiple distinct sources, and they learn a shared latent representation more sensitive to common features across sources. For enhanced image synthesis, here we propose a multi-stream approach that aggregates information across multiple source images via a mixture of multiple one-to-one streams and a joint many-to-one stream. The shared feature maps generated in the many-to-one stream and the complementary feature maps generated in the one-to-one streams are combined with a fusion block. The location of the fusion block is adaptively modified to maximize task-specific performance. Qualitative and quantitative assessments on T1-, T2-, PD-weighted and FLAIR images clearly demonstrate the superior performance of the proposed method compared to previous state-of-the-art one-to-one and many-to-one methods.

CVMay 27, 2018

Synergistic Reconstruction and Synthesis via Generative Adversarial Networks for Accelerated Multi-Contrast MRI

Salman Ul Hassan Dar, Mahmut Yurt, Mohammad Shahdloo et al.

Multi-contrast MRI acquisitions of an anatomy enrich the magnitude of information available for diagnosis. Yet, excessive scan times associated with additional contrasts may be a limiting factor. Two mainstream approaches for enhanced scan efficiency are reconstruction of undersampled acquisitions and synthesis of missing acquisitions. In reconstruction, performance decreases towards higher acceleration factors with diminished sampling density particularly at high-spatial-frequencies. In synthesis, the absence of data samples from the target contrast can lead to artefactual sensitivity or insensitivity to image features. Here we propose a new approach for synergistic reconstruction-synthesis of multi-contrast MRI based on conditional generative adversarial networks. The proposed method preserves high-frequency details of the target contrast by relying on the shared high-frequency information available from the source contrast, and prevents feature leakage or loss by relying on the undersampled acquisitions of the target contrast. Demonstrations on brain MRI datasets from healthy subjects and patients indicate the superior performance of the proposed method compared to previous state-of-the-art. The proposed method can help improve the quality and scan efficiency of multi-contrast MRI exams.

CVFeb 5, 2018

Image Synthesis in Multi-Contrast MRI with Conditional Generative Adversarial Networks

Salman Ul Hassan Dar, Mahmut Yurt, Levent Karacan et al.

Acquiring images of the same anatomy with multiple different contrasts increases the diversity of diagnostic information available in an MR exam. Yet, scan time limitations may prohibit acquisition of certain contrasts, and images for some contrast may be corrupted by noise and artifacts. In such cases, the ability to synthesize unacquired or corrupted contrasts from remaining contrasts can improve diagnostic utility. For multi-contrast synthesis, current methods learn a nonlinear intensity transformation between the source and target images, either via nonlinear regression or deterministic neural networks. These methods can in turn suffer from loss of high-spatial-frequency information in synthesized images. Here we propose a new approach for multi-contrast MRI synthesis based on conditional generative adversarial networks. The proposed approach preserves high-frequency details via an adversarial loss; and it offers enhanced synthesis performance via a pixel-wise loss for registered multi-contrast images and a cycle-consistency loss for unregistered images. Information from neighboring cross-sections are utilized to further improved synthesis quality. Demonstrations on T1- and T2-weighted images from healthy subjects and patients clearly indicate the superior performance of the proposed approach compared to previous state-of-the-art methods. Our synthesis approach can help improve quality and versatility of multi-contrast MRI exams without the need for prolonged examinations.

CVOct 7, 2017

A Transfer-Learning Approach for Accelerated MRI using Deep Neural Networks

Salman Ul Hassan Dar, Muzaffer Özbey, Ahmet Burak Çatlı et al.

Purpose: Neural networks have received recent interest for reconstruction of undersampled MR acquisitions. Ideally network performance should be optimized by drawing the training and testing data from the same domain. In practice, however, large datasets comprising hundreds of subjects scanned under a common protocol are rare. The goal of this study is to introduce a transfer-learning approach to address the problem of data scarcity in training deep networks for accelerated MRI. Methods: Neural networks were trained on thousands of samples from public datasets of either natural images or brain MR images. The networks were then fine-tuned using only few tens of brain MR images in a distinct testing domain. Domain-transferred networks were compared to networks trained directly in the testing domain. Network performance was evaluated for varying acceleration factors (2-10), number of training samples (0.5-4k) and number of fine-tuning samples (0-100). Results: The proposed approach achieves successful domain transfer between MR images acquired with different contrasts (T1- and T2-weighted images), and between natural and MR images (ImageNet and T1- or T2-weighted images). Networks obtained via transfer-learning using only tens of images in the testing domain achieve nearly identical performance to networks trained directly in the testing domain using thousands of images. Conclusion: The proposed approach might facilitate the use of neural networks for MRI reconstruction without the need for collection of extensive imaging datasets.