CLDec 30, 2022
ChatGPT Makes Medicine Easy to Swallow: An Exploratory Case Study on Simplified Radiology ReportsKatharina Jeblick, Balthasar Schachtner, Jakob Dexl et al.
The release of ChatGPT, a language model capable of generating text that appears human-like and authentic, has gained significant attention beyond the research community. We expect that the convincing performance of ChatGPT incentivizes users to apply it to a variety of downstream tasks, including prompting the model to simplify their own medical reports. To investigate this phenomenon, we conducted an exploratory case study. In a questionnaire, we asked 15 radiologists to assess the quality of radiology reports simplified by ChatGPT. Most radiologists agreed that the simplified reports were factually correct, complete, and not potentially harmful to the patient. Nevertheless, instances of incorrect statements, missed key medical findings, and potentially harmful passages were reported. While further studies are needed, the initial insights of this study indicate a great potential in using large language models like ChatGPT to improve patient-centered care in radiology and other medical domains.
IVMar 20, 2023
Cascaded Latent Diffusion Models for High-Resolution Chest X-ray SynthesisTobias Weber, Michael Ingrisch, Bernd Bischl et al.
While recent advances in large-scale foundational models show promising results, their application to the medical domain has not yet been explored in detail. In this paper, we progress into the realms of large-scale modeling in medical synthesis by proposing Cheff - a foundational cascaded latent diffusion model, which generates highly-realistic chest radiographs providing state-of-the-art quality on a 1-megapixel scale. We further propose MaCheX, which is a unified interface for public chest datasets and forms the largest open collection of chest X-rays up to date. With Cheff conditioned on radiological reports, we further guide the synthesis process over text prompts and unveil the research area of report-to-chest-X-ray generation.
CLJun 5, 2023
German CheXpert Chest X-ray Radiology Report LabelerAlessandro Wollek, Sardi Hyska, Thomas Sedlmeyr et al.
This study aimed to develop an algorithm to automatically extract annotations for chest X-ray classification models from German thoracic radiology reports. An automatic label extraction model was designed based on the CheXpert architecture, and a web-based annotation interface was created for iterative improvements. Results showed that automated label extraction can reduce time spent on manual labeling and improve overall modeling performance. The model trained on automatically extracted labels performed competitively to manually labeled data and strongly outperformed the model trained on publicly available data.
IVAug 1, 2022
A knee cannot have lung disease: out-of-distribution detection with in-distribution voting using the medical example of chest X-ray classificationAlessandro Wollek, Theresa Willem, Michael Ingrisch et al.
To investigate the impact of OOD radiographs on existing chest X-ray classification models and to increase their robustness against OOD data. The study employed the commonly used chest X-ray classification model, CheXnet, trained on the chest X-ray 14 data set, and tested its robustness against OOD data using three public radiography data sets: IRMA, Bone Age, and MURA, and the ImageNet data set. To detect OOD data for multi-label classification, we proposed in-distribution voting (IDV). The OOD detection performance is measured across data sets using the area under the receiver operating characteristic curve (AUC) analysis and compared with Mahalanobis-based OOD detection, MaxLogit, MaxEnergy and self-supervised OOD detection (SS OOD). Without additional OOD detection, the chest X-ray classifier failed to discard any OOD images, with an AUC of 0.5. The proposed IDV approach trained on ID (chest X-ray 14) and OOD data (IRMA and ImageNet) achieved, on average, 0.999 OOD AUC across the three data sets, surpassing all other OOD detection methods. Mahalanobis-based OOD detection achieved an average OOD detection AUC of 0.982. IDV trained solely with a few thousand ImageNet images had an AUC 0.913, which was higher than MaxLogit (0.726), MaxEnergy (0.724), and SS OOD (0.476). The performance of all tested OOD detection methods did not translate well to radiography data sets, except Mahalanobis-based OOD detection and the proposed IDV method. Training solely on ID data led to incorrect classification of OOD images as ID, resulting in increased false positive rates. IDV substantially improved the model's ID classification performance, even when trained with data that will not occur in the intended use case or test set, without additional inference overhead.
IVJun 9, 2023
WindowNet: Learnable Windows for Chest X-ray ClassificationAlessandro Wollek, Sardi Hyska, Bastian Sabel et al.
Chest X-ray (CXR) images are commonly compressed to a lower resolution and bit depth to reduce their size, potentially altering subtle diagnostic features. Radiologists use windowing operations to enhance image contrast, but the impact of such operations on CXR classification performance is unclear. In this study, we show that windowing can improve CXR classification performance, and propose WindowNet, a model that learns optimal window settings. We first investigate the impact of bit-depth on classification performance and find that a higher bit-depth (12-bit) leads to improved performance. We then evaluate different windowing settings and show that training with a distinct window generally improves pathology-wise classification performance. Finally, we propose and evaluate WindowNet, a model that learns optimal window settings, and show that it significantly improves performance compared to the baseline model without windowing.
CVJun 9, 2023
Higher Chest X-ray Resolution Improves Classification PerformanceAlessandro Wollek, Sardi Hyska, Bastian Sabel et al.
Deep learning models for image classification are often trained at a resolution of 224 x 224 pixels for historical and efficiency reasons. However, chest X-rays are acquired at a much higher resolution to display subtle pathologies. This study investigates the effect of training resolution on chest X-ray classification performance, using the chest X-ray 14 dataset. The results show that training with a higher image resolution, specifically 1024 x 1024 pixels, results in the best overall classification performance with a mean AUC of 84.2 % compared to 82.7 % when trained with 256 x 256 pixel images. Additionally, comparison of bounding boxes and GradCAM saliency maps suggest that low resolutions, such as 256 x 256 pixels, are insufficient for identifying small pathologies and force the model to use spurious discriminating features. Our code is publicly available at https://gitlab.lrz.de/IP/cxr-resolution
CLJun 9, 2023
Automated Labeling of German Chest X-Ray Radiology Reports using Deep LearningAlessandro Wollek, Philip Haitzer, Thomas Sedlmeyr et al.
Radiologists are in short supply globally, and deep learning models offer a promising solution to address this shortage as part of clinical decision-support systems. However, training such models often requires expensive and time-consuming manual labeling of large datasets. Automatic label extraction from radiology reports can reduce the time required to obtain labeled datasets, but this task is challenging due to semantically similar words and missing annotated data. In this work, we explore the potential of weak supervision of a deep learning-based label prediction model, using a rule-based labeler. We propose a deep learning-based CheXpert label prediction model, pre-trained on reports labeled by a rule-based German CheXpert model and fine-tuned on a small dataset of manually labeled reports. Our results demonstrate the effectiveness of our approach, which significantly outperformed the rule-based model on all three tasks. Our findings highlight the benefits of employing deep learning-based models even in scenarios with sparse data and the use of the rule-based labeler as a tool for weak supervision.
CVMay 7
The autoPET3 Challenge -- Automated Lesion Segmentation in Whole-Body PET/CT - Multitracer Multicenter GeneralizationJakob Dexl, Katharina Jeblick, Andreas Mittermeier et al.
We report the design and results of the third autoPET challenge (MICCAI 2024), which benchmarked automated lesion segmentation in whole-body PET/CT under a compositional generalization setting. Training data comprised 1,014 [18F]-FDG PET/CT studies from the University Hospital Tübingen and 597 [18F]/[68Ga]-PSMA PET/CT studies from the LMU University Hospital Munich, constituting the largest publicly available annotated PSMA PET/CT dataset to date. The held-out test set of 200 studies covered four tracer-center combinations, two of which represented unseen compositional pairings. A complementary data-centric award category isolated the contribution of data handling strategies by restricting participants to a fixed baseline model. Seventeen teams submitted 27 algorithms, predominantly nnU-Net-based 3D networks with PET/CT channel concatenation. The top-ranked algorithm achieved a mean DSC of 0.66, FNV of 3.18 mL, and FPV of 2.78 mL across all four test conditions, improving DSC by 8% and reducing the false-negative volume by 5 mL relative to the provided baseline. Ranking was stable across bootstrap resampling and alternative ranking schemes for the top tier. Beyond the benchmark, we provide an in-depth analysis of segmentation performance at the patient and lesion level. Three main conclusions can be drawn: (1) in-domain multitracer PET/CT segmentation is sufficient and probably approaching reader agreement; (2) compositional generalization to unseen tracer-center combinations remains an open problem mainly driven by systematic volume overestimation; (3) heterogeneity and case difficulty drive performance variation substantially more than the choice of algorithm among top-ranked teams.
LGNov 2, 2023
Post-hoc Orthogonalization for Mitigation of Protected Feature Bias in CXR EmbeddingsTobias Weber, Michael Ingrisch, Bernd Bischl et al.
Purpose: To analyze and remove protected feature effects in chest radiograph embeddings of deep learning models. Methods: An orthogonalization is utilized to remove the influence of protected features (e.g., age, sex, race) in CXR embeddings, ensuring feature-independent results. To validate the efficacy of the approach, we retrospectively study the MIMIC and CheXpert datasets using three pre-trained models, namely a supervised contrastive, a self-supervised contrastive, and a baseline classifier model. Our statistical analysis involves comparing the original versus the orthogonalized embeddings by estimating protected feature influences and evaluating the ability to predict race, age, or sex using the two types of embeddings. Results: Our experiments reveal a significant influence of protected features on predictions of pathologies. Applying orthogonalization removes these feature effects. Apart from removing any influence on pathology classification, while maintaining competitive predictive performance, orthogonalized embeddings further make it infeasible to directly predict protected attributes and mitigate subgroup disparities. Conclusion: The presented work demonstrates the successful application and evaluation of the orthogonalization technique in the domain of chest X-ray image classification.
IVApr 15, 2024
Post-Training Network Compression for 3D Medical Image Segmentation: Reducing Computational Efforts via Tucker DecompositionTobias Weber, Jakob Dexl, David Rügamer et al.
We address the computational barrier of deploying advanced deep learning segmentation models in clinical settings by studying the efficacy of network compression through tensor decomposition. We propose a post-training Tucker factorization that enables the decomposition of pre-existing models to reduce computational requirements without impeding segmentation accuracy. We applied Tucker decomposition to the convolutional kernels of the TotalSegmentator (TS) model, an nnU-Net model trained on a comprehensive dataset for automatic segmentation of 117 anatomical structures. Our approach reduced the floating-point operations (FLOPs) and memory required during inference, offering an adjustable trade-off between computational efficiency and segmentation quality. This study utilized the publicly available TS dataset, employing various downsampling factors to explore the relationship between model size, inference speed, and segmentation performance. The application of Tucker decomposition to the TS model substantially reduced the model parameters and FLOPs across various compression rates, with limited loss in segmentation accuracy. We removed up to 88% of the model's parameters with no significant performance changes in the majority of classes after fine-tuning. Practical benefits varied across different graphics processing unit (GPU) architectures, with more distinct speed-ups on less powerful hardware. Post-hoc network compression via Tucker decomposition presents a viable strategy for reducing the computational demand of medical image segmentation models without substantially sacrificing accuracy. This approach enables the broader adoption of advanced deep learning technologies in clinical practice, offering a way to navigate the constraints of hardware capabilities.
IVMay 25, 2023
Constrained Probabilistic Mask Learning for Task-specific Undersampled MRI ReconstructionTobias Weber, Michael Ingrisch, Bernd Bischl et al.
Undersampling is a common method in Magnetic Resonance Imaging (MRI) to subsample the number of data points in k-space, reducing acquisition times at the cost of decreased image quality. A popular approach is to employ undersampling patterns following various strategies, e.g., variable density sampling or radial trajectories. In this work, we propose a method that directly learns the undersampling masks from data points, thereby also providing task- and domain-specific patterns. To solve the resulting discrete optimization problem, we propose a general optimization routine called ProM: A fully probabilistic, differentiable, versatile, and model-free framework for mask optimization that enforces acceleration factors through a convex constraint. Analyzing knee, brain, and cardiac MRI datasets with our method, we discover that different anatomic regions reveal distinct optimal undersampling masks, demonstrating the benefits of using custom masks, tailored for a downstream task. For example, ProM can create undersampling masks that maximize performance in downstream tasks like segmentation with networks trained on fully-sampled MRIs. Even with extreme acceleration factors, ProM yields reasonable performance while being more versatile than existing methods, paving the way for data-driven all-purpose mask generation.
LGOct 21, 2021
Towards modelling hazard factors in unstructured data spaces using gradient-based latent interpolationTobias Weber, Michael Ingrisch, Bernd Bischl et al.
The application of deep learning in survival analysis (SA) allows utilizing unstructured and high-dimensional data types uncommon in traditional survival methods. This allows to advance methods in fields such as digital health, predictive maintenance, and churn analysis, but often yields less interpretable and intuitively understandable models due to the black-box character of deep learning-based approaches. We close this gap by proposing 1) a multi-task variational autoencoder (VAE) with survival objective, yielding survival-oriented embeddings, and 2) a novel method HazardWalk that allows to model hazard factors in the original data space. HazardWalk transforms the latent distribution of our autoencoder into areas of maximized/minimized hazard and then uses the decoder to project changes to the original domain. Our procedure is evaluated on a simulated dataset as well as on a dataset of CT imaging data of patients with liver metastases.
LGOct 21, 2021
Survival-oriented embeddings for improving accessibility to complex data structuresTobias Weber, Michael Ingrisch, Matthias Fabritius et al.
Deep learning excels in the analysis of unstructured data and recent advancements allow to extend these techniques to survival analysis. In the context of clinical radiology, this enables, e.g., to relate unstructured volumetric images to a risk score or a prognosis of life expectancy and support clinical decision making. Medical applications are, however, associated with high criticality and consequently, neither medical personnel nor patients do usually accept black box models as reason or basis for decisions. Apart from averseness to new technologies, this is due to missing interpretability, transparency and accountability of many machine learning methods. We propose a hazard-regularized variational autoencoder that supports straightforward interpretation of deep neural architectures in the context of survival analysis, a field highly relevant in healthcare. We apply the proposed approach to abdominal CT scans of patients with liver tumors and their corresponding survival times.