IVMar 28, 2023
fRegGAN with K-space Loss Regularization for Medical Image TranslationIvo M. Baltruschat, Felix Kreis, Alexander Hoelscher et al.
Generative adversarial networks (GANs) have shown remarkable success in generating realistic images and are increasingly used in medical imaging for image-to-image translation tasks. However, GANs tend to suffer from a frequency bias towards low frequencies, which can lead to the removal of important structures in the generated images. To address this issue, we propose a novel frequency-aware image-to-image translation framework based on the supervised RegGAN approach, which we call fRegGAN. The framework employs a K-space loss to regularize the frequency content of the generated images and incorporates well-known properties of MRI K-space geometry to guide the network training process. By combine our method with the RegGAN approach, we can mitigate the effect of training with misaligned data and frequency bias at the same time. We evaluate our method on the public BraTS dataset and outperform the baseline methods in terms of both quantitative and qualitative metrics when synthesizing T2-weighted from T1-weighted MR images. Detailed ablation studies are provided to understand the effect of each modification on the final performance. The proposed method is a step towards improving the performance of image-to-image translation and synthesis in the medical domain and shows promise for other applications in the field of image processing and generation.
IVNov 20, 2023
Uncertainty Estimation in Contrast-Enhanced MR Image Translation with Multi-Axis FusionIvo M. Baltruschat, Parvaneh Janbakhshi, Melanie Dohmen et al.
In recent years, deep learning has been applied to a wide range of medical imaging and image processing tasks. In this work, we focus on the estimation of epistemic uncertainty for 3D medical image-to-image translation. We propose a novel model uncertainty quantification method, Multi-Axis Fusion (MAF), which relies on the integration of complementary information derived from multiple views on volumetric image data. The proposed approach is applied to the task of synthesizing contrast enhanced T1-weighted images based on native T1, T2 and T2-FLAIR scans. The quantitative findings indicate a strong correlation ($ρ_{\text healthy} = 0.89$) between the mean absolute image synthetization error and the mean uncertainty score for our MAF method. Hence, we consider MAF as a promising approach to solve the highly relevant task of detecting synthetization failures at inference time.
IVAug 12, 2024
Five Pitfalls When Assessing Synthetic Medical Images with Reference MetricsMelanie Dohmen, Tuan Truong, Ivo M. Baltruschat et al.
Reference metrics have been developed to objectively and quantitatively compare two images. Especially for evaluating the quality of reconstructed or compressed images, these metrics have shown very useful. Extensive tests of such metrics on benchmarks of artificially distorted natural images have revealed which metric best correlate with human perception of quality. Direct transfer of these metrics to the evaluation of generative models in medical imaging, however, can easily lead to pitfalls, because assumptions about image content, image data format and image interpretation are often very different. Also, the correlation of reference metrics and human perception of quality can vary strongly for different kinds of distortions and commonly used metrics, such as SSIM, PSNR and MAE are not the best choice for all situations. We selected five pitfalls that showcase unexpected and probably undesired reference metric scores and discuss strategies to avoid them.
IVMay 14, 2024
Similarity and Quality Metrics for MR Image-To-Image TranslationMelanie Dohmen, Mark A. Klemens, Ivo M. Baltruschat et al.
Image-to-image translation can create large impact in medical imaging, as images can be synthetically transformed to other modalities, sequence types, higher resolutions or lower noise levels. To ensure patient safety, these methods should be validated by human readers, which requires a considerable amount of time and costs. Quantitative metrics can effectively complement such studies and provide reproducible and objective assessment of synthetic images. If a reference is available, the similarity of MR images is frequently evaluated by SSIM and PSNR metrics, even though these metrics are not or too sensitive regarding specific distortions. When reference images to compare with are not available, non-reference quality metrics can reliably detect specific distortions, such as blurriness. To provide an overview on distortion sensitivity, we quantitatively analyze 11 similarity (reference) and 12 quality (non-reference) metrics for assessing synthetic images. We additionally include a metric on a downstream segmentation task. We investigate the sensitivity regarding 11 kinds of distortions and typical MR artifacts, and analyze the influence of different normalization methods on each metric and distortion. Finally, we derive recommendations for effective usage of the analyzed similarity and quality metrics for evaluation of image-to-image translation models.
IVMar 12, 2024
BraSyn 2023 challenge: Missing MRI synthesis and the effect of different learning objectivesIvo M. Baltruschat, Parvaneh Janbakhshi, Matthias Lenga
This work addresses the Brain Magnetic Resonance Image Synthesis for Tumor Segmentation (BraSyn) challenge, which was hosted as part of the Brain Tumor Segmentation (BraTS) challenge in 2023. In this challenge, researchers are invited to synthesize a missing magnetic resonance image sequence, given other available sequences, to facilitate tumor segmentation pipelines trained on complete sets of image sequences. This problem can be tackled using deep learning within the framework of paired image-to-image translation. In this study, we propose investigating the effectiveness of a commonly used deep learning framework, such as Pix2Pix, trained under the supervision of different image-quality loss functions. Our results indicate that the use of different loss functions significantly affects the synthesis quality. We systematically study the impact of various loss functions in the multi-sequence MR image synthesis setting of the BraSyn challenge. Furthermore, we demonstrate how image synthesis performance can be optimized by combining different learning objectives beneficially.
CVJan 16, 2025
Exploring AI-based System Design for Pixel-level Protected Health Information Detection in Medical ImagesTuan Truong, Ivo M. Baltruschat, Mark Klemens et al.
De-identification of medical images is a critical step to ensure privacy during data sharing in research and clinical settings. The initial step in this process involves detecting Protected Health Information (PHI), which can be found in image metadata or imprinted within image pixels. Despite the importance of such systems, there has been limited evaluation of existing AI-based solutions, creating barriers to the development of reliable and robust tools. In this study, we present an AI-based pipeline for PHI detection, comprising three key modules: text detection, text extraction, and text analysis. We benchmark three models - YOLOv11, EasyOCR, and GPT-4o - across different setups corresponding to these modules, evaluating their performance on two different datasets encompassing multiple imaging modalities and PHI categories. Our findings indicate that the optimal setup involves utilizing dedicated vision and language models for each module, which achieves a commendable balance in performance, latency, and cost associated with the usage of Large Language Models (LLMs). Additionally, we show that the application of LLMs not only involves identifying PHI content but also enhances OCR tasks and facilitates an end-to-end PHI detection pipeline, showcasing promising outcomes through our analysis.
LGJan 23, 2020
Smart Chest X-ray Worklist Prioritization using Artificial Intelligence: A Clinical Workflow SimulationIvo M. Baltruschat, Leonhard Steinmeister, Hannes Nickisch et al.
The aim is to evaluate whether smart worklist prioritization by artificial intelligence (AI) can optimize the radiology workflow and reduce report turnaround times (RTAT) for critical findings in chest radiographs (CXRs). Furthermore, we investigate a method to counteract the effect of false negative predictions by AI -- resulting in an extremely and dangerously long RTAT, as CXRs are sorted to the end of the worklist. We developed a simulation framework that models the current workflow at a university hospital by incorporating hospital specific CXR generation rates, reporting rates and pathology distribution. Using this, we simulated the standard worklist processing "first-in, first-out" (FIFO) and compared it with a worklist prioritization based on urgency. Examination prioritization was performed by the AI, classifying eight different pathological findings ranked in descending order of urgency: pneumothorax, pleural effusion, infiltrate, congestion, atelectasis, cardiomegaly, mass and foreign object. Furthermore, we introduced an upper limit for the maximum waiting time, after which the highest urgency is assigned to the examination. The average RTAT for all critical findings was significantly reduced in all Prioritization-simulations compared to the FIFO-simulation (e.g. pneumothorax: 35.6 min vs. 80.1 min; p $<0.0001$), while the maximum RTAT for most findings increased at the same time (e.g. pneumothorax: 1293 min vs 890 min; p $<0.0001$). Our "upper limit" substantially reduced the maximum RTAT all classes (e.g. pneumothorax: 979 min vs. 1293 min / 1178 min; p $<0.0001$). Our simulations demonstrate that smart worklist prioritization by AI can reduce the average RTAT for critical findings in CXRs while maintaining a small maximum RTAT as FIFO.
CVOct 17, 2018
When does Bone Suppression and Lung Field Segmentation Improve Chest X-Ray Disease Classification?Ivo M. Baltruschat, Leonhard Steinmeister, Harald Ittrich et al.
Chest radiography is the most common clinical examination type. To improve the quality of patient care and to reduce workload, methods for automatic pathology classification have been developed. In this contribution we investigate the usefulness of two advanced image pre-processing techniques, initially developed for image reading by radiologists, for the performance of Deep Learning methods. First, we use bone suppression, an algorithm to artificially remove the rib cage. Secondly, we employ an automatic lung field detection to crop the image to the lung area. Furthermore, we consider the combination of both in the context of an ensemble approach. In a five-times re-sampling scheme, we use Receiver Operating Characteristic (ROC) statistics to evaluate the effect of the pre-processing approaches. Using a Convolutional Neural Network (CNN), optimized for X-ray analysis, we achieve a good performance with respect to all pathologies on average. Superior results are obtained for selected pathologies when using pre-processing, i.e. for mass the area under the ROC curve increased by 9.95%. The ensemble with pre-processed trained models yields the best overall results.
CVMar 6, 2018
Comparison of Deep Learning Approaches for Multi-Label Chest X-Ray ClassificationIvo M. Baltruschat, Hannes Nickisch, Michael Grass et al.
The increased availability of X-ray image archives (e.g. the ChestX-ray14 dataset from the NIH Clinical Center) has triggered a growing interest in deep learning techniques. To provide better insight into the different approaches, and their applications to chest X-ray classification, we investigate a powerful network architecture in detail: the ResNet-50. Building on prior work in this domain, we consider transfer learning with and without fine-tuning as well as the training of a dedicated X-ray network from scratch. To leverage the high spatial resolution of X-ray data, we also include an extended ResNet-50 architecture, and a network integrating non-image data (patient age, gender and acquisition type) in the classification process. In a concluding experiment, we also investigate multiple ResNet depths (i.e. ResNet-38 and ResNet-101). In a systematic evaluation, using 5-fold re-sampling and a multi-label loss function, we compare the performance of the different approaches for pathology classification by ROC statistics and analyze differences between the classifiers using rank correlation. Overall, we observe a considerable spread in the achieved performance and conclude that the X-ray-specific ResNet-38, integrating non-image data yields the best overall results. Furthermore, class activation maps are used to understand the classification process, and a detailed analysis of the impact of non-image features is provided.