Burhaneddin Yaman

IV
h-index34
30papers
912citations
Novelty46%
AI Score44

30 Papers

IVMar 23, 2022
Physics-Driven Deep Learning for Computational Magnetic Resonance Imaging

Kerstin Hammernik, Thomas Küstner, Burhaneddin Yaman et al.

Physics-driven deep learning methods have emerged as a powerful tool for computational magnetic resonance imaging (MRI) problems, pushing reconstruction performance to new limits. This article provides an overview of the recent developments in incorporating physics information into learning-based MRI reconstruction. We consider inverse problems with both linear and non-linear forward models for computational MRI, and review the classical approaches for solving these. We then focus on physics-driven deep learning approaches, covering physics-driven loss functions, plug-and-play methods, generative models, and unrolled networks. We highlight domain-specific challenges such as real- and complex-valued building blocks of neural networks, and translational applications in MRI with linear and non-linear forward models. Finally, we discuss common issues and open challenges, and draw connections to the importance of physics-driven learning when combined with other downstream tasks in the medical imaging pipeline.

IVApr 17, 2022
Accelerated MRI With Deep Linear Convolutional Transform Learning

Hongyi Gu, Burhaneddin Yaman, Steen Moeller et al.

Recent studies show that deep learning (DL) based MRI reconstruction outperforms conventional methods, such as parallel imaging and compressed sensing (CS), in multiple applications. Unlike CS that is typically implemented with pre-determined linear representations for regularization, DL inherently uses a non-linear representation learned from a large database. Another line of work uses transform learning (TL) to bridge the gap between these two approaches by learning linear representations from data. In this work, we combine ideas from CS, TL and DL reconstructions to learn deep linear convolutional transforms as part of an algorithm unrolling approach. Using end-to-end training, our results show that the proposed technique can reconstruct MR images to a level comparable to DL methods, while supporting uniform undersampling patterns unlike conventional CS methods. Our proposed method relies on convex sparse image reconstruction with linear representation at inference time, which may be beneficial for characterizing robustness, stability and generalizability.

CVAug 17, 2022
Object Detection for Autonomous Dozers

Chun-Hao Liu, Burhaneddin Yaman

We introduce a new type of autonomous vehicle - an autonomous dozer that is expected to complete construction site tasks in an efficient, robust, and safe manner. To better handle the path planning for the dozer and ensure construction site safety, object detection plays one of the most critical components among perception tasks. In this work, we first collect the construction site data by driving around our dozers. Then we analyze the data thoroughly to understand its distribution. Finally, two well-known object detection models are trained, and their performances are benchmarked with a wide range of training strategies and hyperparameters.

CVMar 19
Reconstruction Matters: Learning Geometry-Aligned BEV Representation through 3D Gaussian Splatting

Yiren Lu, Xin Ye, Burhaneddin Yaman et al.

Bird's-Eye-View (BEV) perception serves as a cornerstone for autonomous driving, offering a unified spatial representation that fuses surrounding-view images to enable reasoning for various downstream tasks, such as semantic segmentation, 3D object detection, and motion prediction. However, most existing BEV perception frameworks adopt an end-to-end training paradigm, where image features are directly transformed into the BEV space and optimized solely through downstream task supervision. This formulation treats the entire perception process as a black box, often lacking explicit 3D geometric understanding and interpretability, leading to suboptimal performance. In this paper, we claim that an explicit 3D representation matters for accurate BEV perception, and we propose Splat2BEV, a Gaussian Splatting-assisted framework for BEV tasks. Splat2BEV aims to learn BEV feature representations that are both semantically rich and geometrically precise. We first pre-train a Gaussian generator that explicitly reconstructs 3D scenes from multi-view inputs, enabling the generation of geometry-aligned feature representations. These representations are then projected into the BEV space to serve as inputs for downstream tasks. Extensive experiments on nuScenes and argoverse dataset demonstrate that Splat2BEV achieves state-of-the-art performance and validate the effectiveness of incorporating explicit 3D reconstruction into BEV perception.

LGNov 2, 2023
Deep Double Descent for Time Series Forecasting: Avoiding Undertrained Models

Valentino Assandri, Sam Heshmati, Burhaneddin Yaman et al.

Deep learning models, particularly Transformers, have achieved impressive results in various domains, including time series forecasting. While existing time series literature primarily focuses on model architecture modifications and data augmentation techniques, this paper explores the training schema of deep learning models for time series; how models are trained regardless of their architecture. We perform extensive experiments to investigate the occurrence of deep double descent in several Transformer models trained on public time series data sets. We demonstrate epoch-wise deep double descent and that overfitting can be reverted using more epochs. Leveraging these findings, we achieve state-of-the-art results for long sequence time series forecasting in nearly 70% of the 72 benchmarks tested. This suggests that many models in the literature may possess untapped potential. Additionally, we introduce a taxonomy for classifying training schema modifications, covering data augmentation, model inputs, model targets, time series per model, and computational budget.

CVMar 24, 2024Code
PaPr: Training-Free One-Step Patch Pruning with Lightweight ConvNets for Faster Inference

Tanvir Mahmud, Burhaneddin Yaman, Chun-Hao Liu et al.

As deep neural networks evolve from convolutional neural networks (ConvNets) to advanced vision transformers (ViTs), there is an increased need to eliminate redundant data for faster processing without compromising accuracy. Previous methods are often architecture-specific or necessitate re-training, restricting their applicability with frequent model updates. To solve this, we first introduce a novel property of lightweight ConvNets: their ability to identify key discriminative patch regions in images, irrespective of model's final accuracy or size. We demonstrate that fully-connected layers are the primary bottleneck for ConvNets performance, and their suppression with simple weight recalibration markedly enhances discriminative patch localization performance. Using this insight, we introduce PaPr, a method for substantially pruning redundant patches with minimal accuracy loss using lightweight ConvNets across a variety of deep learning architectures, including ViTs, ConvNets, and hybrid transformers, without any re-training. Moreover, the simple early-stage one-step patch pruning with PaPr enhances existing patch reduction methods. Through extensive testing on diverse architectures, PaPr achieves significantly higher accuracy over state-of-the-art patch reduction methods with similar FLOP count reduction. More specifically, PaPr reduces about 70% of redundant patches in videos with less than 0.8% drop in accuracy, and up to 3.7x FLOPs reduction, which is a 15% more reduction with 2.5% higher accuracy. Code is released at https://github.com/tanvir-utexas/PaPr.

CVJan 10, 2024
VLP: Vision Language Planning for Autonomous Driving

Chenbin Pan, Burhaneddin Yaman, Tommaso Nesti et al.

Autonomous driving is a complex and challenging task that aims at safe motion planning through scene understanding and reasoning. While vision-only autonomous driving methods have recently achieved notable performance, through enhanced scene understanding, several key issues, including lack of reasoning, low generalization performance and long-tail scenarios, still need to be addressed. In this paper, we present VLP, a novel Vision-Language-Planning framework that exploits language models to bridge the gap between linguistic understanding and autonomous driving. VLP enhances autonomous driving systems by strengthening both the source memory foundation and the self-driving car's contextual understanding. VLP achieves state-of-the-art end-to-end planning performance on the challenging NuScenes dataset by achieving 35.9\% and 60.5\% reduction in terms of average L2 error and collision rates, respectively, compared to the previous best method. Moreover, VLP shows improved performance in challenging long-tail scenarios and strong generalization capabilities when faced with new urban environments.

IVFeb 15, 2021Code
Zero-Shot Self-Supervised Learning for MRI Reconstruction

Burhaneddin Yaman, Seyed Amir Hossein Hosseini, Mehmet Akçakaya

Deep learning (DL) has emerged as a powerful tool for accelerated MRI reconstruction, but often necessitates a database of fully-sampled measurements for training. Recent self-supervised and unsupervised learning approaches enable training without fully-sampled data. However, a database of undersampled measurements may not be available in many scenarios, especially for scans involving contrast or translational acquisitions in development. Moreover, recent studies show that database-trained models may not generalize well when the unseen measurements differ in terms of sampling pattern, acceleration rate, SNR, image contrast, and anatomy. Such challenges necessitate a new methodology to enable subject-specific DL MRI reconstruction without external training datasets, since it is clinically imperative to provide high-quality reconstructions that can be used to identify lesions/disease for \emph{every individual}. In this work, we propose a zero-shot self-supervised learning approach to perform subject-specific accelerated DL MRI reconstruction to tackle these issues. The proposed approach partitions the available measurements from a single scan into three disjoint sets. Two of these sets are used to enforce data consistency and define loss during training for self-supervision, while the last set serves to self-validate, establishing an early stopping criterion. In the presence of models pre-trained on a database with different image characteristics, we show that the proposed approach can be combined with transfer learning for faster convergence time and reduced computational complexity. The code is available at \url{https://github.com/byaman14/ZS-SSL}.

CVMar 13, 2024
CLIP-BEVFormer: Enhancing Multi-View Image-Based BEV Detector with Ground Truth Flow

Chenbin Pan, Burhaneddin Yaman, Senem Velipasalar et al.

Autonomous driving stands as a pivotal domain in computer vision, shaping the future of transportation. Within this paradigm, the backbone of the system plays a crucial role in interpreting the complex environment. However, a notable challenge has been the loss of clear supervision when it comes to Bird's Eye View elements. To address this limitation, we introduce CLIP-BEVFormer, a novel approach that leverages the power of contrastive learning techniques to enhance the multi-view image-derived BEV backbones with ground truth information flow. We conduct extensive experiments on the challenging nuScenes dataset and showcase significant and consistent improvements over the SOTA. Specifically, CLIP-BEVFormer achieves an impressive 8.5\% and 9.2\% enhancement in terms of NDS and mAP, respectively, over the previous best BEV model on the 3D object detection task.

CVFeb 27, 2025
BEVDiffuser: Plug-and-Play Diffusion Model for BEV Denoising with Ground-Truth Guidance

Xin Ye, Burhaneddin Yaman, Sheng Cheng et al.

Bird's-eye-view (BEV) representations play a crucial role in autonomous driving tasks. Despite recent advancements in BEV generation, inherent noise, stemming from sensor limitations and the learning process, remains largely unaddressed, resulting in suboptimal BEV representations that adversely impact the performance of downstream tasks. To address this, we propose BEVDiffuser, a novel diffusion model that effectively denoises BEV feature maps using the ground-truth object layout as guidance. BEVDiffuser can be operated in a plug-and-play manner during training time to enhance existing BEV models without requiring any architectural modifications. Extensive experiments on the challenging nuScenes dataset demonstrate BEVDiffuser's exceptional denoising and generation capabilities, which enable significant enhancement to existing BEV models, as evidenced by notable improvements of 12.3\% in mAP and 10.1\% in NDS achieved for 3D object detection without introducing additional computational complexity. Moreover, substantial improvements in long-tail object detection and under challenging weather and lighting conditions further validate BEVDiffuser's effectiveness in denoising and enhancing BEV representations.

ROJan 22, 2025
AdaWM: Adaptive World Model based Planning for Autonomous Driving

Hang Wang, Xin Ye, Feng Tao et al.

World model based reinforcement learning (RL) has emerged as a promising approach for autonomous driving, which learns a latent dynamics model and uses it to train a planning policy. To speed up the learning process, the pretrain-finetune paradigm is often used, where online RL is initialized by a pretrained model and a policy learned offline. However, naively performing such initialization in RL may result in dramatic performance degradation during the online interactions in the new task. To tackle this challenge, we first analyze the performance degradation and identify two primary root causes therein: the mismatch of the planning policy and the mismatch of the dynamics model, due to distribution shift. We further analyze the effects of these factors on performance degradation during finetuning, and our findings reveal that the choice of finetuning strategies plays a pivotal role in mitigating these effects. We then introduce AdaWM, an Adaptive World Model based planning method, featuring two key steps: (a) mismatch identification, which quantifies the mismatches and informs the finetuning strategy, and (b) alignment-driven finetuning, which selectively updates either the policy or the model as needed using efficient low-rank updates. Extensive experiments on the challenging CARLA driving tasks demonstrate that AdaWM significantly improves the finetuning process, resulting in more robust and efficient performance in autonomous driving systems.

ROMar 27, 2024
LORD: Large Models based Opposite Reward Design for Autonomous Driving

Xin Ye, Feng Tao, Abhirup Mallik et al.

Reinforcement learning (RL) based autonomous driving has emerged as a promising alternative to data-driven imitation learning approaches. However, crafting effective reward functions for RL poses challenges due to the complexity of defining and quantifying good driving behaviors across diverse scenarios. Recently, large pretrained models have gained significant attention as zero-shot reward models for tasks specified with desired linguistic goals. However, the desired linguistic goals for autonomous driving such as "drive safely" are ambiguous and incomprehensible by pretrained models. On the other hand, undesired linguistic goals like "collision" are more concrete and tractable. In this work, we introduce LORD, a novel large models based opposite reward design through undesired linguistic goals to enable the efficient use of large pretrained models as zero-shot reward models. Through extensive experiments, our proposed framework shows its efficiency in leveraging the power of large pretrained models for achieving safe and enhanced autonomous driving. Moreover, the proposed approach shows improved generalization capabilities as it outperforms counterpart methods across diverse and challenging driving scenarios.

CVNov 16, 2024
MTA: Multimodal Task Alignment for BEV Perception and Captioning

Yunsheng Ma, Burhaneddin Yaman, Xin Ye et al.

Bird's eye view (BEV)-based 3D perception plays a crucial role in autonomous driving applications. The rise of large language models has spurred interest in BEV-based captioning to understand object behavior in the surrounding environment. However, existing approaches treat perception and captioning as separate tasks, focusing on the performance of only one task and overlooking the potential benefits of multimodal alignment. To bridge this gap between modalities, we introduce MTA, a novel multimodal task alignment framework that boosts both BEV perception and captioning. MTA consists of two key components: (1) BEV-Language Alignment (BLA), a contextual learning mechanism that aligns the BEV scene representations with ground-truth language representations, and (2) Detection-Captioning Alignment (DCA), a cross-modal prompting mechanism that aligns detection and captioning outputs. MTA seamlessly integrates into state-of-the-art baselines during training, adding no extra computational complexity at runtime. Extensive experiments on the nuScenes and TOD3Cap datasets show that MTA significantly outperforms state-of-the-art baselines in both tasks, achieving a 10.7% improvement in challenging rare perception scenarios and a 9.2% improvement in captioning. These results underscore the effectiveness of unified alignment in reconciling BEV-based perception and captioning.

CVMay 21, 2025
ALN-P3: Unified Language Alignment for Perception, Prediction, and Planning in Autonomous Driving

Yunsheng Ma, Burhaneddin Yaman, Xin Ye et al.

Recent advances have explored integrating large language models (LLMs) into end-to-end autonomous driving systems to enhance generalization and interpretability. However, most existing approaches are limited to either driving performance or vision-language reasoning, making it difficult to achieve both simultaneously. In this paper, we propose ALN-P3, a unified co-distillation framework that introduces cross-modal alignment between "fast" vision-based autonomous driving systems and "slow" language-driven reasoning modules. ALN-P3 incorporates three novel alignment mechanisms: Perception Alignment (P1A), Prediction Alignment (P2A), and Planning Alignment (P3A), which explicitly align visual tokens with corresponding linguistic outputs across the full perception, prediction, and planning stack. All alignment modules are applied only during training and incur no additional costs during inference. Extensive experiments on four challenging benchmarks-nuScenes, Nu-X, TOD3Cap, and nuScenes QA-demonstrate that ALN-P3 significantly improves both driving decisions and language reasoning, achieving state-of-the-art results.

CVSep 4, 2023
SSVOD: Semi-Supervised Video Object Detection with Sparse Annotations

Tanvir Mahmud, Chun-Hao Liu, Burhaneddin Yaman et al.

Despite significant progress in semi-supervised learning for image object detection, several key issues are yet to be addressed for video object detection: (1) Achieving good performance for supervised video object detection greatly depends on the availability of annotated frames. (2) Despite having large inter-frame correlations in a video, collecting annotations for a large number of frames per video is expensive, time-consuming, and often redundant. (3) Existing semi-supervised techniques on static images can hardly exploit the temporal motion dynamics inherently present in videos. In this paper, we introduce SSVOD, an end-to-end semi-supervised video object detection framework that exploits motion dynamics of videos to utilize large-scale unlabeled frames with sparse annotations. To selectively assemble robust pseudo-labels across groups of frames, we introduce \textit{flow-warped predictions} from nearby frames for temporal-consistency estimation. In particular, we introduce cross-IoU and cross-divergence based selection methods over a set of estimated predictions to include robust pseudo-labels for bounding boxes and class labels, respectively. To strike a balance between confirmation bias and uncertainty noise in pseudo-labels, we propose confidence threshold based combination of hard and soft pseudo-labels. Our method achieves significant performance improvements over existing methods on ImageNet-VID, Epic-KITCHENS, and YouTube-VIS datasets. Code and pre-trained models will be released.

CVMay 14, 2023
Instance-Aware Repeat Factor Sampling for Long-Tailed Object Detection

Burhaneddin Yaman, Tanvir Mahmud, Chun-Hao Liu

We propose an embarrassingly simple method -- instance-aware repeat factor sampling (IRFS) to address the problem of imbalanced data in long-tailed object detection. Imbalanced datasets in real-world object detection often suffer from a large disparity in the number of instances for each class. To improve the generalization performance of object detection models on rare classes, various data sampling techniques have been proposed. Repeat factor sampling (RFS) has shown promise due to its simplicity and effectiveness. Despite its efficiency, RFS completely neglects the instance counts and solely relies on the image count during re-sampling process. However, instance count may immensely vary for different classes with similar image counts. Such variation highlights the importance of both image and instance for addressing the long-tail distributions. Thus, we propose IRFS which unifies instance and image counts for the re-sampling process to be aware of different perspectives of the imbalance in long-tailed datasets. Our method shows promising results on the challenging LVIS v1.0 benchmark dataset over various architectures and backbones, demonstrating their effectiveness in improving the performance of object detection models on rare classes with a relative $+50\%$ average precision (AP) improvement over counterpart RFS. IRFS can serve as a strong baseline and be easily incorporated into existing long-tailed frameworks.

CVSep 23, 2021
End-to-End AI-based MRI Reconstruction and Lesion Detection Pipeline for Evaluation of Deep Learning Image Reconstruction

Ruiyang Zhao, Yuxin Zhang, Burhaneddin Yaman et al.

Deep learning techniques have emerged as a promising approach to highly accelerated MRI. However, recent reconstruction challenges have shown several drawbacks in current deep learning approaches, including the loss of fine image details even using models that perform well in terms of global quality metrics. In this study, we propose an end-to-end deep learning framework for image reconstruction and pathology detection, which enables a clinically aware evaluation of deep learning reconstruction quality. The solution is demonstrated for a use case in detecting meniscal tears on knee MRI studies, ultimately finding a loss of fine image details with common reconstruction methods expressed as a reduced ability to detect important pathology like meniscal tears. Despite the common practice of quantitative reconstruction methodology evaluation with metrics such as SSIM, impaired pathology detection as an automated pathology-based reconstruction evaluation approach suggests existing quantitative methods do not capture clinically important reconstruction outcomes.

IVSep 8, 2021
fastMRI+: Clinical Pathology Annotations for Knee and Brain Fully Sampled Multi-Coil MRI Data

Ruiyang Zhao, Burhaneddin Yaman, Yuxin Zhang et al.

Improving speed and image quality of Magnetic Resonance Imaging (MRI) via novel reconstruction approaches remains one of the highest impact applications for deep learning in medical imaging. The fastMRI dataset, unique in that it contains large volumes of raw MRI data, has enabled significant advances in accelerating MRI using deep learning-based reconstruction methods. While the impact of the fastMRI dataset on the field of medical imaging is unquestioned, the dataset currently lacks clinical expert pathology annotations, critical to addressing clinically relevant reconstruction frameworks and exploring important questions regarding rendering of specific pathology using such novel approaches. This work introduces fastMRI+, which consists of 16154 subspecialist expert bounding box annotations and 13 study-level labels for 22 different pathology categories on the fastMRI knee dataset, and 7570 subspecialist expert bounding box annotations and 643 study-level labels for 30 different pathology categories for the fastMRI brain dataset. The fastMRI+ dataset is open access and aims to support further research and advancement of medical imaging in MRI reconstruction and beyond.

IVMay 17, 2021
Unsupervised Deep Learning Methods for Biological Image Reconstruction and Enhancement

Mehmet Akçakaya, Burhaneddin Yaman, Hyungjin Chung et al.

Recently, deep learning approaches have become the main research frontier for biological image reconstruction and enhancement problems thanks to their high performance, along with their ultra-fast inference times. However, due to the difficulty of obtaining matched reference data for supervised learning, there has been increasing interest in unsupervised learning approaches that do not need paired reference data. In particular, self-supervised learning and generative models have been successfully used for various biological imaging applications. In this paper, we overview these approaches from a coherent perspective in the context of classical inverse problems, and discuss their applications to biological imaging, including electron, fluorescence and deconvolution microscopy, optical diffraction tomography and functional neuroimaging.

IVMay 12, 2021
20-fold Accelerated 7T fMRI Using Referenceless Self-Supervised Deep Learning Reconstruction

Omer Burak Demirel, Burhaneddin Yaman, Logan Dowdle et al.

High spatial and temporal resolution across the whole brain is essential to accurately resolve neural activities in fMRI. Therefore, accelerated imaging techniques target improved coverage with high spatio-temporal resolution. Simultaneous multi-slice (SMS) imaging combined with in-plane acceleration are used in large studies that involve ultrahigh field fMRI, such as the Human Connectome Project. However, for even higher acceleration rates, these methods cannot be reliably utilized due to aliasing and noise artifacts. Deep learning (DL) reconstruction techniques have recently gained substantial interest for improving highly-accelerated MRI. Supervised learning of DL reconstructions generally requires fully-sampled training datasets, which is not available for high-resolution fMRI studies. To tackle this challenge, self-supervised learning has been proposed for training of DL reconstruction with only undersampled datasets, showing similar performance to supervised learning. In this study, we utilize a self-supervised physics-guided DL reconstruction on a 5-fold SMS and 4-fold in-plane accelerated 7T fMRI data. Our results show that our self-supervised DL reconstruction produce high-quality images at this 20-fold acceleration, substantially improving on existing methods, while showing similar functional precision and temporal effects in the subsequent analysis compared to a standard 10-fold accelerated acquisition.

IVMay 10, 2021
Improved Simultaneous Multi-Slice Functional MRI Using Self-supervised Deep Learning

Omer Burak Demirel, Burhaneddin Yaman, Logan Dowdle et al.

Functional MRI (fMRI) is commonly used for interpreting neural activities across the brain. Numerous accelerated fMRI techniques aim to provide improved spatiotemporal resolutions. Among these, simultaneous multi-slice (SMS) imaging has emerged as a powerful strategy, becoming a part of large-scale studies, such as the Human Connectome Project. However, when SMS imaging is combined with in-plane acceleration for higher acceleration rates, conventional SMS reconstruction methods may suffer from noise amplification and other artifacts. Recently, deep learning (DL) techniques have gained interest for improving MRI reconstruction. However, these methods are typically trained in a supervised manner that necessitates fully-sampled reference data, which is not feasible in highly-accelerated fMRI acquisitions. Self-supervised learning that does not require fully-sampled data has recently been proposed and has shown similar performance to supervised learning. However, it has only been applied for in-plane acceleration. Furthermore the effect of DL reconstruction on subsequent fMRI analysis remains unclear. In this work, we extend self-supervised DL reconstruction to SMS imaging. Our results on prospectively 10-fold accelerated 7T fMRI data show that self-supervised DL reduces reconstruction noise and suppresses residual artifacts. Subsequent fMRI analysis remains unaltered by DL processing, while the improved temporal signal-to-noise ratio produces higher coherence estimates between task runs.

IVFeb 25, 2021
On Instabilities of Conventional Multi-Coil MRI Reconstruction to Small Adverserial Perturbations

Chi Zhang, Jinghan Jia, Burhaneddin Yaman et al.

Although deep learning (DL) has received much attention in accelerated MRI, recent studies suggest small perturbations may lead to instabilities in DL-based reconstructions, leading to concern for their clinical application. However, these works focus on single-coil acquisitions, which is not practical. We investigate instabilities caused by small adversarial attacks for multi-coil acquisitions. Our results suggest that, parallel imaging and multi-coil CS exhibit considerable instabilities against small adversarial perturbations.

IVNov 18, 2020
Self-Supervised Physics-Guided Deep Learning Reconstruction For High-Resolution 3D LGE CMR

Burhaneddin Yaman, Chetan Shenoy, Zilin Deng et al.

Late gadolinium enhancement (LGE) cardiac MRI (CMR) is the clinical standard for diagnosis of myocardial scar. 3D isotropic LGE CMR provides improved coverage and resolution compared to 2D imaging. However, image acceleration is required due to long scan times and contrast washout. Physics-guided deep learning (PG-DL) approaches have recently emerged as an improved accelerated MRI strategy. Training of PG-DL methods is typically performed in supervised manner requiring fully-sampled data as reference, which is challenging in 3D LGE CMR. Recently, a self-supervised learning approach was proposed to enable training PG-DL techniques without fully-sampled data. In this work, we extend this self-supervised learning approach to 3D imaging, while tackling challenges related to small training database sizes of 3D volumes. Results and a reader study on prospectively accelerated 3D LGE show that the proposed approach at 6-fold acceleration outperforms the clinically utilized compressed sensing approach at 3-fold acceleration.

IVOct 26, 2020
Improved Supervised Training of Physics-Guided Deep Learning Image Reconstruction with Multi-Masking

Burhaneddin Yaman, Seyed Amir Hossein Hosseini, Steen Moeller et al.

Physics-guided deep learning (PG-DL) via algorithm unrolling has received significant interest for improved image reconstruction, including MRI applications. These methods unroll an iterative optimization algorithm into a series of regularizer and data consistency units. The unrolled networks are typically trained end-to-end using a supervised approach. Current supervised PG-DL approaches use all of the available sub-sampled measurements in their data consistency units. Thus, the network learns to fit the rest of the measurements. In this study, we propose to improve the performance and robustness of supervised training by utilizing randomness by retrospectively selecting only a subset of all the available measurements for data consistency units. The process is repeated multiple times using different random masks during training for further enhancement. Results on knee MRI show that the proposed multi-mask supervised PG-DL enhances reconstruction performance compared to conventional supervised PG-DL approaches.

IVAug 13, 2020
Multi-Mask Self-Supervised Learning for Physics-Guided Neural Networks in Highly Accelerated MRI

Burhaneddin Yaman, Hongyi Gu, Seyed Amir Hossein Hosseini et al.

Self-supervised learning has shown great promise due to its capability to train deep learning MRI reconstruction methods without fully-sampled data. Current self-supervised learning methods for physics-guided reconstruction networks split acquired undersampled data into two disjoint sets, where one is used for data consistency (DC) in the unrolled network and the other to define the training loss. In this study, we propose an improved self-supervised learning strategy that more efficiently uses the acquired data to train a physics-guided reconstruction network without a database of fully-sampled data. The proposed multi-mask self-supervised learning via data undersampling (SSDU) applies a hold-out masking operation on acquired measurements to split it into multiple pairs of disjoint sets for each training sample, while using one of these pairs for DC units and the other for defining loss, thereby more efficiently using the undersampled data. Multi-mask SSDU is applied on fully-sampled 3D knee and prospectively undersampled 3D brain MRI datasets, for various acceleration rates and patterns, and compared to CG-SENSE and single-mask SSDU DL-MRI, as well as supervised DL-MRI when fully-sampled data is available. Results on knee MRI show that the proposed multi-mask SSDU outperforms SSDU and performs closely with supervised DL-MRI. A clinical reader study further ranks the multi-mask SSDU higher than supervised DL-MRI in terms of SNR and aliasing artifacts. Results on brain MRI show that multi-mask SSDU achieves better reconstruction quality compared to SSDU. Reader study demonstrates that multi-mask SSDU at R=8 significantly improves reconstruction compared to single-mask SSDU at R=8, as well as CG-SENSE at R=2.

IVJun 16, 2020
Noise2Inpaint: Learning Referenceless Denoising by Inpainting Unrolling

Burhaneddin Yaman, Seyed Amir Hossein Hosseini, Mehmet Akçakaya

Deep learning based image denoising methods have been recently popular due to their improved performance. Traditionally, these methods are trained in a supervised manner, requiring a set of noisy input and clean target image pairs. More recently, self-supervised approaches have been proposed to learn denoising from only noisy images. These methods assume that noise across pixels is statistically independent, and the underlying image pixels show spatial correlations across neighborhoods. These methods rely on a masking approach that divides the image pixels into two disjoint sets, where one is used as input to the network while the other is used to define the loss. However, these previous self-supervised approaches rely on a purely data-driven regularization neural network without explicitly taking the masking model into account. In this work, building on these self-supervised approaches, we introduce Noise2Inpaint (N2I), a training approach that recasts the denoising problem into a regularized image inpainting framework. This allows us to use an objective function, which can incorporate different statistical properties of the noise as needed. We use algorithm unrolling to unroll an iterative optimization for solving this objective function and train the unrolled network end-to-end. The training paradigm follows the masking approach from previous works, splitting the pixels into two disjoint sets. Importantly, one of these is now used to impose data fidelity in the unrolled network, while the other still defines the loss. We demonstrate that N2I performs successful denoising on real-world datasets, while better preserving details compared to its purely data-driven counterpart Noise2Self.

IVMay 12, 2020
High-Fidelity Accelerated MRI Reconstruction by Scan-Specific Fine-Tuning of Physics-Based Neural Networks

Seyed Amir Hossein Hosseini, Burhaneddin Yaman, Steen Moeller et al.

Long scan duration remains a challenge for high-resolution MRI. Deep learning has emerged as a powerful means for accelerated MRI reconstruction by providing data-driven regularizers that are directly learned from data. These data-driven priors typically remain unchanged for future data in the testing phase once they are learned during training. In this study, we propose to use a transfer learning approach to fine-tune these regularizers for new subjects using a self-supervision approach. While the proposed approach can compromise the extremely fast reconstruction time of deep learning MRI methods, our results on knee MRI indicate that such adaptation can substantially reduce the remaining artifacts in reconstructed images. In addition, the proposed approach has the potential to reduce the risks of generalization to rare pathological conditions, which may be unavailable in the training data.

IVDec 16, 2019
Self-Supervised Learning of Physics-Guided Reconstruction Neural Networks without Fully-Sampled Reference Data

Burhaneddin Yaman, Seyed Amir Hossein Hosseini, Steen Moeller et al.

Purpose: To develop a strategy for training a physics-guided MRI reconstruction neural network without a database of fully-sampled datasets. Theory and Methods: Self-supervised learning via data under-sampling (SSDU) for physics-guided deep learning (DL) reconstruction partitions available measurements into two disjoint sets, one of which is used in the data consistency units in the unrolled network and the other is used to define the loss for training. The proposed training without fully-sampled data is compared to fully-supervised training with ground-truth data, as well as conventional compressed sensing and parallel imaging methods using the publicly available fastMRI knee database. The same physics-guided neural network is used for both proposed SSDU and supervised training. The SSDU training is also applied to prospectively 2-fold accelerated high-resolution brain datasets at different acceleration rates, and compared to parallel imaging. Results: Results on five different knee sequences at acceleration rate of 4 shows that proposed self-supervised approach performs closely with supervised learning, while significantly outperforming conventional compressed sensing and parallel imaging, as characterized by quantitative metrics and a clinical reader study. The results on prospectively sub-sampled brain datasets, where supervised learning cannot be employed due to lack of ground-truth reference, show that the proposed self-supervised approach successfully perform reconstruction at high acceleration rates (4, 6 and 8). Image readings indicate improved visual reconstruction quality with the proposed approach compared to parallel imaging at acquisition acceleration. Conclusion: The proposed SSDU approach allows training of physics-guided DL-MRI reconstruction without fully-sampled data, while achieving comparable results with supervised DL-MRI trained on fully-sampled data.

IVDec 16, 2019
Dense Recurrent Neural Networks for Accelerated MRI: History-Cognizant Unrolling of Optimization Algorithms

Seyed Amir Hossein Hosseini, Burhaneddin Yaman, Steen Moeller et al.

Inverse problems for accelerated MRI typically incorporate domain-specific knowledge about the forward encoding operator in a regularized reconstruction framework. Recently physics-driven deep learning (DL) methods have been proposed to use neural networks for data-driven regularization. These methods unroll iterative optimization algorithms to solve the inverse problem objective function, by alternating between domain-specific data consistency and data-driven regularization via neural networks. The whole unrolled network is then trained end-to-end to learn the parameters of the network. Due to simplicity of data consistency updates with gradient descent steps, proximal gradient descent (PGD) is a common approach to unroll physics-driven DL reconstruction methods. However, PGD methods have slow convergence rates, necessitating a higher number of unrolled iterations, leading to memory issues in training and slower reconstruction times in testing. Inspired by efficient variants of PGD methods that use a history of the previous iterates, we propose a history-cognizant unrolling of the optimization algorithm with dense connections across iterations for improved performance. In our approach, the gradient descent steps are calculated at a trainable combination of the outputs of all the previous regularization units. We also apply this idea to unrolling variable splitting methods with quadratic relaxation. Our results in reconstruction of the fastMRI knee dataset show that the proposed history-cognizant approach reduces residual aliasing artifacts compared to its conventional unrolled counterpart without requiring extra computational power or increasing reconstruction time.

IVOct 21, 2019
Self-Supervised Physics-Based Deep Learning MRI Reconstruction Without Fully-Sampled Data

Burhaneddin Yaman, Seyed Amir Hossein Hosseini, Steen Moeller et al.

Deep learning (DL) has emerged as a tool for improving accelerated MRI reconstruction. A common strategy among DL methods is the physics-based approach, where a regularized iterative algorithm alternating between data consistency and a regularizer is unrolled for a finite number of iterations. This unrolled network is then trained end-to-end in a supervised manner, using fully-sampled data as ground truth for the network output. However, in a number of scenarios, it is difficult to obtain fully-sampled datasets, due to physiological constraints such as organ motion or physical constraints such as signal decay. In this work, we tackle this issue and propose a self-supervised learning strategy that enables physics-based DL reconstruction without fully-sampled data. Our approach is to divide the acquired sub-sampled points for each scan into training and validation subsets. During training, data consistency is enforced over the training subset, while the validation subset is used to define the loss function. Results show that the proposed self-supervised learning method successfully reconstructs images without fully-sampled data, performing similarly to the supervised approach that is trained with fully-sampled references. This has implications for physics-based inverse problem approaches for other settings, where fully-sampled data is not available or possible to acquire.