CVJul 29, 2022
Transfer Learning for Segmentation Problems: Choose the Right Encoder and Skip the DecoderJonas Dippel, Matthias Lenga, Thomas Goerttler et al.
It is common practice to reuse models initially trained on different data to increase downstream task performance. Especially in the computer vision domain, ImageNet-pretrained weights have been successfully used for various tasks. In this work, we investigate the impact of transfer learning for segmentation problems, being pixel-wise classification problems that can be tackled with encoder-decoder architectures. We find that transfer learning the decoder does not help downstream segmentation tasks, while transfer learning the encoder is truly beneficial. We demonstrate that pretrained weights for a decoder may yield faster convergence, but they do not improve the overall model performance as one can obtain equivalent results with randomly initialized decoders. However, we show that it is more effective to reuse encoder weights trained on a segmentation or reconstruction task than reusing encoder weights trained on classification tasks. This finding implicates that using ImageNet-pretrained encoders for downstream segmentation problems is suboptimal. We also propose a contrastive self-supervised approach with multiple self-reconstruction tasks, which provides encoders that are suitable for transfer learning in segmentation problems in the absence of segmentation labels.
CVNov 3, 2025Code
Towards Selection of Large Multimodal Models as Engines for Burned-in Protected Health Information Detection in Medical ImagesTuan Truong, Guillermo Jimenez Perez, Pedro Osorio et al.
The detection of Protected Health Information (PHI) in medical imaging is critical for safeguarding patient privacy and ensuring compliance with regulatory frameworks. Traditional detection methodologies predominantly utilize Optical Character Recognition (OCR) models in conjunction with named entity recognition. However, recent advancements in Large Multimodal Model (LMM) present new opportunities for enhanced text extraction and semantic analysis. In this study, we systematically benchmark three prominent closed and open-sourced LMMs, namely GPT-4o, Gemini 2.5 Flash, and Qwen 2.5 7B, utilizing two distinct pipeline configurations: one dedicated to text analysis alone and another integrating both OCR and semantic analysis. Our results indicate that LMM exhibits superior OCR efficacy (WER: 0.03-0.05, CER: 0.02-0.03) compared to conventional models like EasyOCR. However, this improvement in OCR performance does not consistently correlate with enhanced overall PHI detection accuracy. The strongest performance gains are observed on test cases with complex imprint patterns. In scenarios where text regions are well readable with sufficient contrast, and strong LMMs are employed for text analysis after OCR, different pipeline configurations yield similar results. Furthermore, we provide empirically grounded recommendations for LMM selection tailored to specific operational constraints and propose a deployment strategy that leverages scalable and modular infrastructure.
CVNov 22, 2023
Medical Image Retrieval Using Pretrained EmbeddingsFarnaz Khun Jush, Tuan Truong, Steffen Vogler et al.
A wide range of imaging techniques and data formats available for medical images make accurate retrieval from image databases challenging. Efficient retrieval systems are crucial in advancing medical research, enabling large-scale studies and innovative diagnostic tools. Thus, addressing the challenges of medical image retrieval is essential for the continued enhancement of healthcare and research. In this study, we evaluated the feasibility of employing four state-of-the-art pretrained models for medical image retrieval at modality, body region, and organ levels and compared the results of two similarity indexing approaches. Since the employed networks take 2D images, we analyzed the impacts of weighting and sampling strategies to incorporate 3D information during retrieval of 3D volumes. We showed that medical image retrieval is feasible using pretrained networks without any additional training or fine-tuning steps. Using pretrained embeddings, we achieved a recall of 1 for various tasks at modality, body region, and organ level.
IVMar 28, 2023
fRegGAN with K-space Loss Regularization for Medical Image TranslationIvo M. Baltruschat, Felix Kreis, Alexander Hoelscher et al.
Generative adversarial networks (GANs) have shown remarkable success in generating realistic images and are increasingly used in medical imaging for image-to-image translation tasks. However, GANs tend to suffer from a frequency bias towards low frequencies, which can lead to the removal of important structures in the generated images. To address this issue, we propose a novel frequency-aware image-to-image translation framework based on the supervised RegGAN approach, which we call fRegGAN. The framework employs a K-space loss to regularize the frequency content of the generated images and incorporates well-known properties of MRI K-space geometry to guide the network training process. By combine our method with the RegGAN approach, we can mitigate the effect of training with misaligned data and frequency bias at the same time. We evaluate our method on the public BraTS dataset and outperform the baseline methods in terms of both quantitative and qualitative metrics when synthesizing T2-weighted from T1-weighted MR images. Detailed ablation studies are provided to understand the effect of each modification on the final performance. The proposed method is a step towards improving the performance of image-to-image translation and synthesis in the medical domain and shows promise for other applications in the field of image processing and generation.
IVNov 20, 2023
Uncertainty Estimation in Contrast-Enhanced MR Image Translation with Multi-Axis FusionIvo M. Baltruschat, Parvaneh Janbakhshi, Melanie Dohmen et al.
In recent years, deep learning has been applied to a wide range of medical imaging and image processing tasks. In this work, we focus on the estimation of epistemic uncertainty for 3D medical image-to-image translation. We propose a novel model uncertainty quantification method, Multi-Axis Fusion (MAF), which relies on the integration of complementary information derived from multiple views on volumetric image data. The proposed approach is applied to the task of synthesizing contrast enhanced T1-weighted images based on native T1, T2 and T2-FLAIR scans. The quantitative findings indicate a strong correlation ($ρ_{\text healthy} = 0.89$) between the mean absolute image synthetization error and the mean uncertainty score for our MAF method. Hence, we consider MAF as a promising approach to solve the highly relevant task of detecting synthetization failures at inference time.
IVAug 12, 2024
Five Pitfalls When Assessing Synthetic Medical Images with Reference MetricsMelanie Dohmen, Tuan Truong, Ivo M. Baltruschat et al.
Reference metrics have been developed to objectively and quantitatively compare two images. Especially for evaluating the quality of reconstructed or compressed images, these metrics have shown very useful. Extensive tests of such metrics on benchmarks of artificially distorted natural images have revealed which metric best correlate with human perception of quality. Direct transfer of these metrics to the evaluation of generative models in medical imaging, however, can easily lead to pitfalls, because assumptions about image content, image data format and image interpretation are often very different. Also, the correlation of reference metrics and human perception of quality can vary strongly for different kinds of distortions and commonly used metrics, such as SSIM, PSNR and MAE are not the best choice for all situations. We selected five pitfalls that showcase unexpected and probably undesired reference metric scores and discuss strategies to avoid them.
SDApr 22, 2022
Fused Audio Instance and Representation for Respiratory Disease DetectionTuan Truong, Matthias Lenga, Antoine Serrurier et al.
Audio-based classification techniques on body sounds have long been studied to aid in the diagnosis of respiratory diseases. While most research is centered on the use of cough as the main biomarker, other body sounds also have the potential to detect respiratory diseases. Recent studies on COVID-19 have shown that breath and speech sounds, in addition to cough, correlate with the disease. Our study proposes Fused Audio Instance and Representation (FAIR) as a method for respiratory disease detection. FAIR relies on constructing a joint feature vector from various body sounds represented in waveform and spectrogram form. We conducted experiments on the use case of COVID-19 detection by combining waveform and spectrogram representation of body sounds. Our findings show that the use of self-attention to combine extracted features from cough, breath, and speech sounds leads to the best performance with an Area Under the Receiver Operating Characteristic Curve (AUC) score of 0.8658, a sensitivity of 0.8057, and a specificity of 0.7958. Compared to models trained solely on spectrograms or waveforms, the use of both representations results in an improved AUC score, demonstrating that combining spectrogram and waveform representation helps to enrich the extracted features and outperforms the models that use only one representation.
CVFeb 9
Zero-shot System for Automatic Body Region Detection for Volumetric CT and MR ImagesFarnaz Khun Jush, Grit Werner, Mark Klemens et al.
Reliable identification of anatomical body regions is a prerequisite for many automated medical imaging workflows, yet existing solutions remain heavily dependent on unreliable DICOM metadata. Current solutions mainly use supervised learning, which limits their applicability in many real-world scenarios. In this work, we investigate whether body region detection in volumetric CT and MR images can be achieved in a fully zero-shot manner by using knowledge embedded in large pre-trained foundation models. We propose and systematically evaluate three training-free pipelines: (1) a segmentation-driven rule-based system leveraging pre-trained multi-organ segmentation models, (2) a Multimodal Large Language Model (MLLM) guided by radiologist-defined rules, and (3) a segmentation-aware MLLM that combines visual input with explicit anatomical evidence. All methods are evaluated on 887 heterogeneous CT and MR scans with manually verified anatomical region labels. The segmentation-driven rule-based approach achieves the strongest and most consistent performance, with weighted F1-scores of 0.947 (CT) and 0.914 (MR), demonstrating robustness across modalities and atypical scan coverage. The MLLM performs competitively in visually distinctive regions, while the segmentation-aware MLLM reveals fundamental limitations.
IVMay 14, 2024
Similarity and Quality Metrics for MR Image-To-Image TranslationMelanie Dohmen, Mark A. Klemens, Ivo M. Baltruschat et al.
Image-to-image translation can create large impact in medical imaging, as images can be synthetically transformed to other modalities, sequence types, higher resolutions or lower noise levels. To ensure patient safety, these methods should be validated by human readers, which requires a considerable amount of time and costs. Quantitative metrics can effectively complement such studies and provide reproducible and objective assessment of synthetic images. If a reference is available, the similarity of MR images is frequently evaluated by SSIM and PSNR metrics, even though these metrics are not or too sensitive regarding specific distortions. When reference images to compare with are not available, non-reference quality metrics can reliably detect specific distortions, such as blurriness. To provide an overview on distortion sensitivity, we quantitatively analyze 11 similarity (reference) and 12 quality (non-reference) metrics for assessing synthetic images. We additionally include a metric on a downstream segmentation task. We investigate the sensitivity regarding 11 kinds of distortions and typical MR artifacts, and analyze the influence of different normalization methods on each metric and distortion. Finally, we derive recommendations for effective usage of the analyzed similarity and quality metrics for evaluation of image-to-image translation models.
IVMar 12, 2024
BraSyn 2023 challenge: Missing MRI synthesis and the effect of different learning objectivesIvo M. Baltruschat, Parvaneh Janbakhshi, Matthias Lenga
This work addresses the Brain Magnetic Resonance Image Synthesis for Tumor Segmentation (BraSyn) challenge, which was hosted as part of the Brain Tumor Segmentation (BraTS) challenge in 2023. In this challenge, researchers are invited to synthesize a missing magnetic resonance image sequence, given other available sequences, to facilitate tumor segmentation pipelines trained on complete sets of image sequences. This problem can be tackled using deep learning within the framework of paired image-to-image translation. In this study, we propose investigating the effectiveness of a commonly used deep learning framework, such as Pix2Pix, trained under the supervision of different image-quality loss functions. Our results indicate that the use of different loss functions significantly affects the synthesis quality. We systematically study the impact of various loss functions in the multi-sequence MR image synthesis setting of the BraSyn challenge. Furthermore, we demonstrate how image synthesis performance can be optimized by combining different learning objectives beneficially.
CVDec 12, 2023
Benchmarking Pretrained Vision Embeddings for Near- and Duplicate Detection in Medical ImagesTuan Truong, Farnaz Khun Jush, Matthias Lenga
Near- and duplicate image detection is a critical concern in the field of medical imaging. Medical datasets often contain similar or duplicate images from various sources, which can lead to significant performance issues and evaluation biases, especially in machine learning tasks due to data leakage between training and testing subsets. In this paper, we present an approach for identifying near- and duplicate 3D medical images leveraging publicly available 2D computer vision embeddings. We assessed our approach by comparing embeddings extracted from two state-of-the-art self-supervised pretrained models and two different vector index structures for similarity retrieval. We generate an experimental benchmark based on the publicly available Medical Segmentation Decathlon dataset. The proposed method yields promising results for near- and duplicate image detection achieving a mean sensitivity and specificity of 0.9645 and 0.8559, respectively.
CVMay 15, 2024
Content-Based Image Retrieval for Multi-Class Volumetric Radiology Images: A Benchmark StudyFarnaz Khun Jush, Steffen Vogler, Tuan Truong et al.
While content-based image retrieval (CBIR) has been extensively studied in natural image retrieval, its application to medical images presents ongoing challenges, primarily due to the 3D nature of medical images. Recent studies have shown the potential use of pre-trained vision embeddings for CBIR in the context of radiology image retrieval. However, a benchmark for the retrieval of 3D volumetric medical images is still lacking, hindering the ability to objectively evaluate and compare the efficiency of proposed CBIR approaches in medical imaging. In this study, we extend previous work and establish a benchmark for region-based and localized multi-organ retrieval using the TotalSegmentator dataset (TS) with detailed multi-organ annotations. We benchmark embeddings derived from pre-trained supervised models on medical images against embeddings derived from pre-trained unsupervised models on non-medical images for 29 coarse and 104 detailed anatomical structures in volume and region levels. For volumetric image retrieval, we adopt a late interaction re-ranking method inspired by text matching. We compare it against the original method proposed for volume and region retrieval and achieve a retrieval recall of 1.0 for diverse anatomical regions with a wide size range. The findings and methodologies presented in this paper provide insights and benchmarks for further development and evaluation of CBIR approaches in the context of medical imaging.
CVJan 16, 2025
Exploring AI-based System Design for Pixel-level Protected Health Information Detection in Medical ImagesTuan Truong, Ivo M. Baltruschat, Mark Klemens et al.
De-identification of medical images is a critical step to ensure privacy during data sharing in research and clinical settings. The initial step in this process involves detecting Protected Health Information (PHI), which can be found in image metadata or imprinted within image pixels. Despite the importance of such systems, there has been limited evaluation of existing AI-based solutions, creating barriers to the development of reliable and robust tools. In this study, we present an AI-based pipeline for PHI detection, comprising three key modules: text detection, text extraction, and text analysis. We benchmark three models - YOLOv11, EasyOCR, and GPT-4o - across different setups corresponding to these modules, evaluating their performance on two different datasets encompassing multiple imaging modalities and PHI categories. Our findings indicate that the optimal setup involves utilizing dedicated vision and language models for each module, which achieves a commendable balance in performance, latency, and cost associated with the usage of Large Language Models (LLMs). Additionally, we show that the application of LLMs not only involves identifying PHI content but also enhances OCR tasks and facilitates an end-to-end PHI detection pipeline, showcasing promising outcomes through our analysis.
CVJul 23, 2025
Content-based 3D Image Retrieval and a ColBERT-inspired Re-ranking for Tumor Flagging and StagingFarnaz Khun Jush, Steffen Vogler, Matthias Lenga
The increasing volume of medical images poses challenges for radiologists in retrieving relevant cases. Content-based image retrieval (CBIR) systems offer potential for efficient access to similar cases, yet lack standardized evaluation and comprehensive studies. Building on prior studies for tumor characterization via CBIR, this study advances CBIR research for volumetric medical images through three key contributions: (1) a framework eliminating reliance on pre-segmented data and organ-specific datasets, aligning with large and unstructured image archiving systems, i.e. PACS in clinical practice; (2) introduction of C-MIR, a novel volumetric re-ranking method adapting ColBERT's contextualized late interaction mechanism for 3D medical imaging; (3) comprehensive evaluation across four tumor sites using three feature extractors and three database configurations. Our evaluations highlight the significant advantages of C-MIR. We demonstrate the successful adaptation of the late interaction principle to volumetric medical images, enabling effective context-aware re-ranking. A key finding is C-MIR's ability to effectively localize the region of interest, eliminating the need for pre-segmentation of datasets and offering a computationally efficient alternative to systems relying on expensive data enrichment steps. C-MIR demonstrates promising improvements in tumor flagging, achieving improved performance, particularly for colon and lung tumors (p<0.05). C-MIR also shows potential for improving tumor staging, warranting further exploration of its capabilities. Ultimately, our work seeks to bridge the gap between advanced retrieval techniques and their practical applications in healthcare, paving the way for improved diagnostic processes.
CVAug 23, 2021
How Transferable Are Self-supervised Features in Medical Image Classification Tasks?Tuan Truong, Sadegh Mohammadi, Matthias Lenga
Transfer learning has become a standard practice to mitigate the lack of labeled data in medical classification tasks. Whereas finetuning a downstream task using supervised ImageNet pretrained features is straightforward and extensively investigated in many works, there is little study on the usefulness of self-supervised pretraining. In this paper, we assess the transferability of ImageNet self-supervisedpretraining by evaluating the performance of models initialized with pretrained features from three self-supervised techniques (SimCLR, SwAV, and DINO) on selected medical classification tasks. The chosen tasks cover tumor detection in sentinel axillary lymph node images, diabetic retinopathy classification in fundus images, and multiple pathological condition classification in chest X-ray images. We demonstrate that self-supervised pretrained models yield richer embeddings than their supervised counterpart, which benefits downstream tasks in view of both linear evaluation and finetuning. For example, in view of linear evaluation at acritically small subset of the data, we see an improvement up to 14.79% in Kappa score in the diabetic retinopathy classification task, 5.4% in AUC in the tumor classification task, 7.03% AUC in the pneumonia detection, and 9.4% in AUC in the detection of pathological conditions in chest X-ray. In addition, we introduce Dynamic Visual Meta-Embedding (DVME) as an end-to-end transfer learning approach that fuses pretrained embeddings from multiple models. We show that the collective representation obtained by DVME leads to a significant improvement in the performance of selected tasks compared to using a single pretrained model approach and can be generalized to any combination of pretrained models.
IVOct 2, 2020
Multi-Resolution 3D Convolutional Neural Networks for Automatic Coronary Centerline Extraction in Cardiac CT Angiography ScansZohaib Salahuddin, Matthias Lenga, Hannes Nickisch
We propose a deep learning-based automatic coronary artery tree centerline tracker (AuCoTrack) extending the vessel tracker by Wolterink (arXiv:1810.03143). A dual pathway Convolutional Neural Network (CNN) operating on multi-scale 3D inputs predicts the direction of the coronary arteries as well as the presence of a bifurcation. A similar multi-scale dual pathway 3D CNN is trained to identify coronary artery endpoints for terminating the tracking process. Two or more continuation directions are derived based on the bifurcation detection. The iterative tracker detects the entire left and right coronary artery trees based on only two ostium landmarks derived from a model-based segmentation of the heart. The 3D CNNs were trained on a proprietary dataset consisting of 43 CCTA scans. An average sensitivity of 87.1% and clinically relevant overlap of 89.1% was obtained relative to a refined manual segmentation. In addition, the MICCAI 2008 Coronary Artery Tracking Challenge (CAT08) training and test datasets were used to benchmark the algorithm and to assess its generalization. An average overlap of 93.6% and a clinically relevant overlap of 96.4% were obtained. The proposed method achieved better overlap scores than the current state-of-the-art automatic centerline extraction techniques on the CAT08 dataset with a vessel detection rate of 95%.
CVJan 16, 2020
Continual Learning for Domain Adaptation in Chest X-ray ClassificationMatthias Lenga, Heinrich Schulz, Axel Saalbach
Over the last years, Deep Learning has been successfully applied to a broad range of medical applications. Especially in the context of chest X-ray classification, results have been reported which are on par, or even superior to experienced radiologists. Despite this success in controlled experimental environments, it has been noted that the ability of Deep Learning models to generalize to data from a new domain (with potentially different tasks) is often limited. In order to address this challenge, we investigate techniques from the field of Continual Learning (CL) including Joint Training (JT), Elastic Weight Consolidation (EWC) and Learning Without Forgetting (LWF). Using the ChestX-ray14 and the MIMIC-CXR datasets, we demonstrate empirically that these methods provide promising options to improve the performance of Deep Learning models on a target domain and to mitigate effectively catastrophic forgetting for the source domain. To this end, the best overall performance was obtained using JT, while for LWF competitive results could be achieved - even without accessing data from the source domain.
CVSep 19, 2018
Deep Learning Based Rib Centerline Extraction and LabelingMatthias Lenga, Tobias Klinder, Christian Bürger et al.
Automated extraction and labeling of rib centerlines is a typically needed prerequisite for more advanced assisted reading tools that help the radiologist to efficiently inspect all 24 ribs in a CT volume. In this paper, we combine a deep learning-based rib detection with a dedicated centerline extraction algorithm applied to the detection result for the purpose of fast, robust and accurate rib centerline extraction and labeling from CT volumes. More specifically, we first apply a fully convolutional neural network (FCNN) to generate a probability map for detecting the first rib pair, the twelfth rib pair, and the collection of all intermediate ribs. In a second stage, a newly designed centerline extraction algorithm is applied to this multi-label probability map. Finally, the distinct detection of first and twelfth rib separately, allows to derive individual rib labels by simple sorting and counting the detected centerlines. We applied our method to CT volumes from 116 patients which included a variety of different challenges and achieved a centerline accuracy of 0.787 mm with respect to manual centerline annotations. This article is a preprint version of: Lenga M., Klinder T., Bürger C., von Berg J., Franz A., Lorenz C. (2019) Deep Learning Based Rib Centerline Extraction and Labeling. In: Vrtovec T., Yao J., Zheng G., Pozo J. (eds) Computational Methods and Clinical Applications in Musculoskeletal Imaging. MSKI 2018. Lecture Notes in Computer Science, vol 11404. Springer, Cham