Dominik Müller

IV
h-index6
18papers
1,018citations
Novelty17%
AI Score32

18 Papers

CVDec 16, 2022
Biomedical image analysis competitions: The state of current participation practice

Matthias Eisenmann, Annika Reinke, Vivienn Weru et al. · utoronto

The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.

CVOct 24, 2022Code
MISm: A Medical Image Segmentation Metric for Evaluation of weak labeled Data

Dennis Hartmann, Verena Schmid, Philip Meyer et al.

Performance measures are an important tool for assessing and comparing different medical image segmentation algorithms. Unfortunately, the current measures have their weaknesses when it comes to assessing certain edge cases. These limitations arouse when images with a very small region of interest or without a region of interest at all are assessed. As a solution for these limitations, we propose a new medical image segmentation metric: MISm. To evaluate MISm, the popular metrics in the medical image segmentation and MISm were compared using images of magnet resonance tomography from several scenarios. In order to allow application in the community and reproducibility of experimental results, we included MISm in the publicly available evaluation framework MISeval: https://github.com/frankkramer-lab/miseval/tree/master/miseval

IVOct 20, 2022
Standardized Medical Image Classification across Medical Disciplines

Simone Mayer, Dominik Müller, Frank Kramer

AUCMEDI is a Python-based framework for medical image classification. In this paper, we evaluate the capabilities of AUCMEDI, by applying it to multiple datasets. Datasets were specifically chosen to cover a variety of medical disciplines and imaging modalities. We designed a simple pipeline using Jupyter notebooks and applied it to all datasets. Results show that AUCMEDI was able to train a model with accurate classification capabilities for each dataset: Averaged AUC per dataset range between 0.82 and 1.0, averaged F1 scores range between 0.61 and 1.0. With its high adaptability and strong performance, AUCMEDI proves to be a powerful instrument to build widely applicable neural networks. The notebooks serve as application examples for AUCMEDI.

CVJun 16, 2022
Nucleus Segmentation and Analysis in Breast Cancer with the MIScnn Framework

Adrian Pfleiderer, Dominik Müller, Frank Kramer

The NuCLS dataset contains over 220.000 annotations of cell nuclei in breast cancers. We show how to use these data to create a multi-rater model with the MIScnn Framework to automate the analysis of cell nuclei. For the model creation, we use the widespread U-Net approach embedded in a pipeline. This pipeline provides besides the high performance convolution neural network, several preprocessor techniques and a extended data exploration. The final model is tested in the evaluation phase using a wide variety of metrics with a subsequent visualization. Finally, the results are compared and interpreted with the results of the NuCLS study. As an outlook, indications are given which are important for the future development of models in the context of cell nuclei.

IVMar 25, 2024Code
DeepGleason: a System for Automated Gleason Grading of Prostate Cancer using Deep Neural Networks

Dominik Müller, Philip Meyer, Lukas Rentschler et al.

Advances in digital pathology and artificial intelligence (AI) offer promising opportunities for clinical decision support and enhancing diagnostic workflows. Previous studies already demonstrated AI's potential for automated Gleason grading, but lack state-of-the-art methodology and model reusability. To address this issue, we propose DeepGleason: an open-source deep neural network based image classification system for automated Gleason grading using whole-slide histopathology images from prostate tissue sections. Implemented with the standardized AUCMEDI framework, our tool employs a tile-wise classification approach utilizing fine-tuned image preprocessing techniques in combination with a ConvNeXt architecture which was compared to various state-of-the-art architectures. The neural network model was trained and validated on an in-house dataset of 34,264 annotated tiles from 369 prostate carcinoma slides. We demonstrated that DeepGleason is capable of highly accurate and reliable Gleason grading with a macro-averaged F1-score of 0.806, AUC of 0.991, and Accuracy of 0.974. The internal architecture comparison revealed that the ConvNeXt model was superior performance-wise on our dataset to established and other modern architectures like transformers. Furthermore, we were able to outperform the current state-of-the-art in tile-wise fine-classification with a sensitivity and specificity of 0.94 and 0.98 for benign vs malignant detection as well as of 0.91 and 0.75 for Gleason 3 vs Gleason 4 & 5 classification, respectively. Our tool contributes to the wider adoption of AI-based Gleason grading within the research community and paves the way for broader clinical application of deep learning models in digital pathology. DeepGleason is open-source and publicly available for research application in the following Git repository: https://github.com/frankkramer-lab/DeepGleason.

CVJan 23, 2022Code
MISeval: a Metric Library for Medical Image Segmentation Evaluation

Dominik Müller, Dennis Hartmann, Philip Meyer et al.

Correct performance assessment is crucial for evaluating modern artificial intelligence algorithms in medicine like deep-learning based medical image segmentation models. However, there is no universal metric library in Python for standardized and reproducible evaluation. Thus, we propose our open-source publicly available Python package MISeval: a metric library for Medical Image Segmentation Evaluation. The implemented metrics can be intuitively used and easily integrated into any performance assessment pipeline. The package utilizes modern CI/CD strategies to ensure functionality and stability. MISeval is available from PyPI (miseval) and GitHub: https://github.com/frankkramer-lab/miseval.

IVJun 24, 2020Code
Automated Chest CT Image Segmentation of COVID-19 Lung Infection based on 3D U-Net

Dominik Müller, Iñaki Soto Rey, Frank Kramer

The coronavirus disease 2019 (COVID-19) affects billions of lives around the world and has a significant impact on public healthcare. Due to rising skepticism towards the sensitivity of RT-PCR as screening method, medical imaging like computed tomography offers great potential as alternative. For this reason, automated image segmentation is highly desired as clinical decision support for quantitative assessment and disease monitoring. However, publicly available COVID-19 imaging data is limited which leads to overfitting of traditional approaches. To address this problem, we propose an innovative automated segmentation pipeline for COVID-19 infected regions, which is able to handle small datasets by utilization as variant databases. Our method focuses on on-the-fly generation of unique and random image patches for training by performing several preprocessing methods and exploiting extensive data augmentation. For further reduction of the overfitting risk, we implemented a standard 3D U-Net architecture instead of new or computational complex neural network architectures. Through a 5-fold cross-validation on 20 CT scans of COVID-19 patients, we were able to develop a highly accurate as well as robust segmentation model for lungs and COVID-19 infected regions without overfitting on the limited data. Our method achieved Dice similarity coefficients of 0.956 for lungs and 0.761 for infection. We demonstrated that the proposed method outperforms related approaches, advances the state-of-the-art for COVID-19 segmentation and improves medical image analysis with limited data. The code and model are available under the following link: https://github.com/frankkramer-lab/covid19.MIScnn

IVOct 21, 2019Code
MIScnn: A Framework for Medical Image Segmentation with Convolutional Neural Networks and Deep Learning

Dominik Müller, Frank Kramer

The increased availability and usage of modern medical imaging induced a strong need for automatic medical image segmentation. Still, current image segmentation platforms do not provide the required functionalities for plain setup of medical image segmentation pipelines. Already implemented pipelines are commonly standalone software, optimized on a specific public data set. Therefore, this paper introduces the open-source Python library MIScnn. The aim of MIScnn is to provide an intuitive API allowing fast building of medical image segmentation pipelines including data I/O, preprocessing, data augmentation, patch-wise analysis, metrics, a library with state-of-the-art deep learning models and model utilization like training, prediction, as well as fully automatic evaluation (e.g. cross-validation). Similarly, high configurability and multiple open interfaces allow full pipeline customization. Running a cross-validation with MIScnn on the Kidney Tumor Segmentation Challenge 2019 data set (multi-class semantic segmentation with 300 CT scans) resulted into a powerful predictor based on the standard 3D U-Net model. With this experiment, we could show that the MIScnn framework enables researchers to rapidly set up a complete medical image segmentation pipeline by using just a few lines of code. The source code for MIScnn is available in the Git repository: https://github.com/frankkramer-lab/MIScnn.

IVMar 25, 2024
Assessing the Performance of Deep Learning for Automated Gleason Grading in Prostate Cancer

Dominik Müller, Philip Meyer, Lukas Rentschler et al.

Prostate cancer is a dominant health concern calling for advanced diagnostic tools. Utilizing digital pathology and artificial intelligence, this study explores the potential of 11 deep neural network architectures for automated Gleason grading in prostate carcinoma focusing on comparing traditional and recent architectures. A standardized image classification pipeline, based on the AUCMEDI framework, facilitated robust evaluation using an in-house dataset consisting of 34,264 annotated tissue tiles. The results indicated varying sensitivity across architectures, with ConvNeXt demonstrating the strongest performance. Notably, newer architectures achieved superior performance, even though with challenges in differentiating closely related Gleason grades. The ConvNeXt model was capable of learning a balance between complexity and generalizability. Overall, this study lays the groundwork for enhanced Gleason grading systems, potentially improving diagnostic efficiency for prostate cancer.

CVAug 28, 2025
Classifying Mitotic Figures in the MIDOG25 Challenge with Deep Ensemble Learning and Rule Based Refinement

Sara Krauss, Ellena Spieß, Daniel Hieber et al.

Mitotic figures (MFs) are relevant biomarkers in tumor grading. Differentiating atypical MFs (AMFs) from normal MFs (NMFs) remains difficult, as manual annotation is time-consuming and subjective. In this work an ensemble of ConvNeXtBase models was trained with AUCMEDI and extend with a rule-based refinement (RBR) module. On the MIDOG25 preliminary test set, the ensemble achieved a balanced accuracy of 84.02%. While the RBR increased specificity, it reduced sensitivity and overall performance. The results show that deep ensembles perform well for AMF classification. RBR can increase specific metrics but requires further research.

IVMay 15, 2023
Towards Automated COVID-19 Presence and Severity Classification

Dominik Müller, Niklas Schröter, Silvan Mertes et al.

COVID-19 presence classification and severity prediction via (3D) thorax computed tomography scans have become important tasks in recent times. Especially for capacity planning of intensive care units, predicting the future severity of a COVID-19 patient is crucial. The presented approach follows state-of-theart techniques to aid medical professionals in these situations. It comprises an ensemble learning strategy via 5-fold cross-validation that includes transfer learning and combines pre-trained 3D-versions of ResNet34 and DenseNet121 for COVID19 classification and severity prediction respectively. Further, domain-specific preprocessing was applied to optimize model performance. In addition, medical information like the infection-lung-ratio, patient age, and sex were included. The presented model achieves an AUC of 79.0% to predict COVID-19 severity, and 83.7% AUC to classify the presence of an infection, which is comparable with other currently popular methods. This approach is implemented using the AUCMEDI framework and relies on well-known network architectures to ensure robustness and reproducibility.

IVFeb 10, 2022
Towards a Guideline for Evaluation Metrics in Medical Image Segmentation

Dominik Müller, Iñaki Soto-Rey, Frank Kramer

In the last decade, research on artificial intelligence has seen rapid growth with deep learning models, especially in the field of medical image segmentation. Various studies demonstrated that these models have powerful prediction capabilities and achieved similar results as clinicians. However, recent studies revealed that the evaluation in image segmentation studies lacks reliable model performance assessment and showed statistical bias by incorrect metric implementation or usage. Thus, this work provides an overview and interpretation guide on the following metrics for medical image segmentation evaluation in binary as well as multi-class problems: Dice similarity coefficient, Jaccard, Sensitivity, Specificity, Rand index, ROC curves, Cohen's Kappa, and Hausdorff distance. As a summary, we propose a guideline for standardized medical image segmentation evaluation to improve evaluation quality, reproducibility, and comparability in the research field.

CVJan 27, 2022
An Analysis on Ensemble Learning optimized Medical Image Classification with Deep Convolutional Neural Networks

Dominik Müller, Iñaki Soto-Rey, Frank Kramer

Novel and high-performance medical image classification pipelines are heavily utilizing ensemble learning strategies. The idea of ensemble learning is to assemble diverse models or multiple predictions and, thus, boost prediction performance. However, it is still an open question to what extent as well as which ensemble learning strategies are beneficial in deep learning based medical image classification pipelines. In this work, we proposed a reproducible medical image classification pipeline for analyzing the performance impact of the following ensemble learning techniques: Augmenting, Stacking, and Bagging. The pipeline consists of state-of-the-art preprocessing and image augmentation methods as well as 9 deep convolution neural network architectures. It was applied on four popular medical imaging datasets with varying complexity. Furthermore, 12 pooling functions for combining multiple predictions were analyzed, ranging from simple statistical functions like unweighted averaging up to more complex learning-based functions like support vector machines. Our results revealed that Stacking achieved the largest performance gain of up to 13% F1-score increase. Augmenting showed consistent improvement capabilities by up to 4% and is also applicable to single model based pipelines. Cross-validation based Bagging demonstrated significant performance gain close to Stacking, which resulted in an F1-score increase up to +11%. Furthermore, we demonstrated that simple statistical pooling functions are equal or often even better than more complex pooling functions. We concluded that the integration of ensemble learning techniques is a powerful method for any medical image classification pipeline to improve robustness and boost performance.

CYJan 25, 2022
Perspective on Code Submission and Automated Evaluation Platforms for University Teaching

Florian Auer, Johann Frei, Dominik Müller et al.

We present a perspective on platforms for code submission and automated evaluation in the context of university teaching. Due to the COVID-19 pandemic, such platforms have become an essential asset for remote courses and a reasonable standard for structured code submission concerning increasing numbers of students in computer sciences. Utilizing automated code evaluation techniques exhibits notable positive impacts for both students and teachers in terms of quality and scalability. We identified relevant technical and non-technical requirements for such platforms in terms of practical applicability and secure code submission environments. Furthermore, a survey among students was conducted to obtain empirical data on general perception. We conclude that submission and automated evaluation involves continuous maintenance yet lowers the required workload for teachers and provides better evaluation transparency for students.

CVOct 3, 2021
Classification of Viral Pneumonia X-ray Images with the Aucmedi Framework

Pia Schneider, Dominik Müller, Frank Kramer

In this work we use the AUCMEDI-Framework to train a deep neural network to classify chest X-ray images as either normal or viral pneumonia. Stratified k-fold cross-validation with k=3 is used to generate the validation-set and 15% of the data are set aside for the evaluation of the models of the different folds and ensembles each. A random-forest ensemble as well as a Soft-Majority-Vote ensemble are built from the predictions of the different folds. Evaluation metrics (Classification-Report, macro f1-scores, Confusion-Matrices, ROC-Curves) of the individual folds and the ensembles show that the classifier works well. Finally Grad-CAM and LIME explainable artificial intelligence (XAI) algorithms are applied to visualize the image features that are most important for the prediction. For Grad-CAM the heatmaps of the three folds are furthermore averaged for all images in order to calculate a mean XAI-heatmap. As the heatmaps of the different folds for most images differ only slightly this averaging procedure works well. However, only medical professionals can evaluate the quality of the features marked by the XAI. A comparison of the evaluation metrics with metrics of standard procedures such as PCR would also be important. Further limitations are discussed.

IVMar 30, 2021
Assessing the Role of Random Forests in Medical Image Segmentation

Dennis Hartmann, Dominik Müller, Iñaki Soto-Rey et al.

Neural networks represent a field of research that can quickly achieve very good results in the field of medical image segmentation using a GPU. A possible way to achieve good results without GPUs are random forests. For this purpose, two random forest approaches were compared with a state-of-the-art deep convolutional neural network. To make the comparison the PhC-C2DH-U373 and the retinal imaging datasets were used. The evaluation showed that the deep convolutional neutral network achieved the best results. However, one of the random forest approaches also achieved a similar high performance. Our results indicate that random forest approaches are a good alternative to deep convolutional neural networks and, thus, allow the usage of medical image segmentation without a GPU.

IVMar 26, 2021
Multi-Disease Detection in Retinal Imaging based on Ensembling Heterogeneous Deep Learning Models

Dominik Müller, Iñaki Soto-Rey, Frank Kramer

Preventable or undiagnosed visual impairment and blindness affect billion of people worldwide. Automated multi-disease detection models offer great potential to address this problem via clinical decision support in diagnosis. In this work, we proposed an innovative multi-disease detection pipeline for retinal imaging which utilizes ensemble learning to combine the predictive capabilities of several heterogeneous deep convolutional neural network models. Our pipeline includes state-of-the-art strategies like transfer learning, class weighting, real-time image augmentation and Focal loss utilization. Furthermore, we integrated ensemble learning techniques like heterogeneous deep learning models, bagging via 5-fold cross-validation and stacked logistic regression models. Through internal and external evaluation, we were able to validate and demonstrate high accuracy and reliability of our pipeline, as well as the comparability with other state-of-the-art pipelines for retinal disease prediction.

MLSep 15, 2017
Dependence Modeling in Ultra High Dimensions with Vine Copulas and the Graphical Lasso

Dominik Müller, Claudia Czado

To model high dimensional data, Gaussian methods are widely used since they remain tractable and yield parsimonious models by imposing strong assumptions on the data. Vine copulas are more flexible by combining arbitrary marginal distributions and (conditional) bivariate copulas. Yet, this adaptability is accompanied by sharply increasing computational effort as the dimension increases. The approach proposed in this paper overcomes this burden and makes the first step into ultra high dimensional non-Gaussian dependence modeling by using a divide-and-conquer approach. First, we apply Gaussian methods to split datasets into feasibly small subsets and second, apply parsimonious and flexible vine copulas thereon. Finally, we reconcile them into one joint model. We provide numerical results demonstrating the feasibility of our approach in moderate dimensions and showcase its ability to estimate ultra high dimensional non-Gaussian dependence models in thousands of dimensions.