IVJul 30, 2023Code
Validating polyp and instrument segmentation methods in colonoscopy through Medico 2020 and MedAI 2021 ChallengesDebesh Jha, Vanshali Sharma, Debapriya Banik et al. · oxford
Automatic analysis of colonoscopy images has been an active field of research motivated by the importance of early detection of precancerous polyps. However, detecting polyps during the live examination can be challenging due to various factors such as variation of skills and experience among the endoscopists, lack of attentiveness, and fatigue leading to a high polyp miss-rate. Deep learning has emerged as a promising solution to this challenge as it can assist endoscopists in detecting and classifying overlooked polyps and abnormalities in real time. In addition to the algorithm's accuracy, transparency and interpretability are crucial to explaining the whys and hows of the algorithm's prediction. Further, most algorithms are developed in private data, closed source, or proprietary software, and methods lack reproducibility. Therefore, to promote the development of efficient and transparent methods, we have organized the "Medico automatic polyp segmentation (Medico 2020)" and "MedAI: Transparency in Medical Image Segmentation (MedAI 2021)" competitions. We present a comprehensive summary and analyze each contribution, highlight the strength of the best-performing methods, and discuss the possibility of clinical translations of such methods into the clinic. For the transparency task, a multi-disciplinary team, including expert gastroenterologists, accessed each submission and evaluated the team based on open-source practices, failure case analysis, ablation studies, usability and understandability of evaluations to gain a deeper understanding of the models' credibility for clinical deployment. Through the comprehensive analysis of the challenge, we not only highlight the advancements in polyp and surgical instrument segmentation but also encourage qualitative evaluation for building more transparent and understandable AI-based colonoscopy systems.
IVMar 7, 2023
Patched Diffusion Models for Unsupervised Anomaly Detection in Brain MRIFinn Behrendt, Debayan Bhattacharya, Julia Krüger et al.
The use of supervised deep learning techniques to detect pathologies in brain MRI scans can be challenging due to the diversity of brain anatomy and the need for annotated data sets. An alternative approach is to use unsupervised anomaly detection, which only requires sample-level labels of healthy brains to create a reference representation. This reference representation can then be compared to unhealthy brain anatomy in a pixel-wise manner to identify abnormalities. To accomplish this, generative models are needed to create anatomically consistent MRI scans of healthy brains. While recent diffusion models have shown promise in this task, accurately generating the complex structure of the human brain remains a challenge. In this paper, we propose a method that reformulates the generation task of diffusion models as a patch-based estimation of healthy brain anatomy, using spatial context to guide and improve reconstruction. We evaluate our approach on data of tumors and multiple sclerosis lesions and demonstrate a relative improvement of 25.1% compared to existing baselines.
IVApr 11, 2022
Ultrasound Shear Wave Elasticity Imaging with Spatio-Temporal Deep LearningMaximilian Neidhardt, Marcel Bengs, Sarah Latus et al.
Ultrasound shear wave elasticity imaging is a valuable tool for quantifying the elastic properties of tissue. Typically, the shear wave velocity is derived and mapped to an elasticity value, which neglects information such as the shape of the propagating shear wave or push sequence characteristics. We present 3D spatio-temporal CNNs for fast local elasticity estimation from ultrasound data. This approach is based on retrieving elastic properties from shear wave propagation within small local regions. A large training data set is acquired with a robot from homogeneous gelatin phantoms ranging from 17.42 kPa to 126.05 kPa with various push locations. The results show that our approach can estimate elastic properties on a pixelwise basis with a mean absolute error of 5.01+-4.37 kPa. Furthermore, we estimate local elasticity independent of the push location and can even perform accurate estimates inside the push region. For phantoms with embedded inclusions, we report a 53.93% lower MAE (7.50 kPa) and on the background of 85.24% (1.64 kPa) compared to a conventional shear wave method. Overall, our method offers fast local estimations of elastic properties with small spatio-temporal window sizes.
IVSep 5, 2022
Supervised Contrastive Learning to Classify Paranasal Anomalies in the Maxillary SinusDebayan Bhattacharya, Benjamin Tobias Becker, Finn Behrendt et al.
Using deep learning techniques, anomalies in the paranasal sinus system can be detected automatically in MRI images and can be further analyzed and classified based on their volume, shape and other parameters like local contrast. However due to limited training data, traditional supervised learning methods often fail to generalize. Existing deep learning methods in paranasal anomaly classification have been used to diagnose at most one anomaly. In our work, we consider three anomalies. Specifically, we employ a 3D CNN to separate maxillary sinus volumes without anomalies from maxillary sinus volumes with anomalies. To learn robust representations from a small labelled dataset, we propose a novel learning paradigm that combines contrastive loss and cross-entropy loss. Particularly, we use a supervised contrastive loss that encourages embeddings of maxillary sinus volumes with and without anomaly to form two distinct clusters while the cross-entropy loss encourages the 3D CNN to maintain its discriminative ability. We report that optimising with both losses is advantageous over optimising with only one loss. We also find that our training strategy leads to label efficiency. With our method, a 3D CNN classifier achieves an AUROC of 0.85 while a 3D CNN classifier optimised with cross-entropy loss achieves an AUROC of 0.66.
IVMar 31, 2023
Multiple Instance Ensembling For Paranasal Anomaly Classification In The Maxillary SinusDebayan Bhattacharya, Finn Behrendt, Benjamin Tobias Becker et al.
Paranasal anomalies are commonly discovered during routine radiological screenings and can present with a wide range of morphological features. This diversity can make it difficult for convolutional neural networks (CNNs) to accurately classify these anomalies, especially when working with limited datasets. Additionally, current approaches to paranasal anomaly classification are constrained to identifying a single anomaly at a time. These challenges necessitate the need for further research and development in this area. In this study, we investigate the feasibility of using a 3D convolutional neural network (CNN) to classify healthy maxillary sinuses (MS) and MS with polyps or cysts. The task of accurately identifying the relevant MS volume within larger head and neck Magnetic Resonance Imaging (MRI) scans can be difficult, but we develop a straightforward strategy to tackle this challenge. Our end-to-end solution includes the use of a novel sampling technique that not only effectively localizes the relevant MS volume, but also increases the size of the training dataset and improves classification results. Additionally, we employ a multiple instance ensemble prediction method to further boost classification performance. Finally, we identify the optimal size of MS volumes to achieve the highest possible classification performance on our dataset. With our multiple instance ensemble prediction strategy and sampling strategy, our 3D CNNs achieve an F1 of 0.85 whereas without it, they achieve an F1 of 0.70. We demonstrate the feasibility of classifying anomalies in the MS. We propose a data enlarging strategy alongside a novel ensembling strategy that proves to be beneficial for paranasal anomaly classification in the MS.
ROJun 12, 2023
Collaborative Robotic Biopsy with Trajectory Guidance and Needle Tip Force FeedbackRobin Mieling, Maximilian Neidhardt, Sarah Latus et al.
The diagnostic value of biopsies is highly dependent on the placement of needles. Robotic trajectory guidance has been shown to improve needle positioning, but feedback for real-time navigation is limited. Haptic display of needle tip forces can provide rich feedback for needle navigation by enabling localization of tissue structures along the insertion path. We present a collaborative robotic biopsy system that combines trajectory guidance with kinesthetic feedback to assist the physician in needle placement. The robot aligns the needle while the insertion is performed in collaboration with a medical expert who controls the needle position on site. We present a needle design that senses forces at the needle tip based on optical coherence tomography and machine learning for real-time data processing. Our robotic setup allows operators to sense deep tissue interfaces independent of frictional forces to improve needle placement relative to a desired target structure. We first evaluate needle tip force sensing in ex-vivo tissue in a phantom study. We characterize the tip forces during insertions with constant velocity and demonstrate the ability to detect tissue interfaces in a collaborative user study. Participants are able to detect 91% of ex-vivo tissue interfaces based on needle tip force feedback alone. Finally, we demonstrate that even smaller, deep target structures can be accurately sampled by performing post-mortem in situ biopsies of the pancreas.
CVAug 17, 2022
Data-Efficient Vision Transformers for Multi-Label Disease Classification on Chest RadiographsFinn Behrendt, Debayan Bhattacharya, Julia Krüger et al.
Radiographs are a versatile diagnostic tool for the detection and assessment of pathologies, for treatment planning or for navigation and localization purposes in clinical interventions. However, their interpretation and assessment by radiologists can be tedious and error-prone. Thus, a wide variety of deep learning methods have been proposed to support radiologists interpreting radiographs. Mostly, these approaches rely on convolutional neural networks (CNN) to extract features from images. Especially for the multi-label classification of pathologies on chest radiographs (Chest X-Rays, CXR), CNNs have proven to be well suited. On the Contrary, Vision Transformers (ViTs) have not been applied to this task despite their high classification performance on generic images and interpretable local saliency maps which could add value to clinical interventions. ViTs do not rely on convolutions but on patch-based self-attention and in contrast to CNNs, no prior knowledge of local connectivity is present. While this leads to increased capacity, ViTs typically require an excessive amount of training data which represents a hurdle in the medical domain as high costs are associated with collecting large medical data sets. In this work, we systematically compare the classification performance of ViTs and CNNs for different data set sizes and evaluate more data-efficient ViT variants (DeiT). Our results show that while the performance between ViTs and CNNs is on par with a small benefit for ViTs, DeiTs outperform the former if a reasonably large data set is available for training.
IVApr 12, 2022
Unsupervised Anomaly Detection in 3D Brain MRI using Deep Learning with impured training dataFinn Behrendt, Marcel Bengs, Frederik Rogge et al.
The detection of lesions in magnetic resonance imaging (MRI)-scans of human brains remains challenging, time-consuming and error-prone. Recently, unsupervised anomaly detection (UAD) methods have shown promising results for this task. These methods rely on training data sets that solely contain healthy samples. Compared to supervised approaches, this significantly reduces the need for an extensive amount of labeled training data. However, data labelling remains error-prone. We study how unhealthy samples within the training data affect anomaly detection performance for brain MRI-scans. For our evaluations, we consider three publicly available data sets and use autoencoders (AE) as a well-established baseline method for UAD. We systematically evaluate the effect of impured training data by injecting different quantities of unhealthy samples to our training set of healthy samples from T1-weighted MRI-scans. We evaluate a method to identify falsely labeled samples directly during training based on the reconstruction error of the AE. Our results show that training with impured data decreases the UAD performance notably even with few falsely labeled samples. By performing outlier removal directly during training based on the reconstruction-loss, we demonstrate that falsely labeled data can be detected and removed to mitigate the effect of falsely labeled data. Overall, we highlight the importance of clean data sets for UAD in brain MRI and demonstrate an approach for detecting falsely labeled data directly during training.
IVJul 17, 2024
Leveraging the Mahalanobis Distance to enhance Unsupervised Brain MRI Anomaly DetectionFinn Behrendt, Debayan Bhattacharya, Robin Mieling et al.
Unsupervised Anomaly Detection (UAD) methods rely on healthy data distributions to identify anomalies as outliers. In brain MRI, a common approach is reconstruction-based UAD, where generative models reconstruct healthy brain MRIs, and anomalies are detected as deviations between input and reconstruction. However, this method is sensitive to imperfect reconstructions, leading to false positives that impede the segmentation. To address this limitation, we construct multiple reconstructions with probabilistic diffusion models. We then analyze the resulting distribution of these reconstructions using the Mahalanobis distance to identify anomalies as outliers. By leveraging information about normal variations and covariance of individual pixels within this distribution, we effectively refine anomaly scoring, leading to improved segmentation. Our experimental results demonstrate substantial performance improvements across various data sets. Specifically, compared to relying solely on single reconstructions, our approach achieves relative improvements of 15.9%, 35.4%, 48.0%, and 4.7% in terms of AUPRC for the BRATS21, ATLAS, MSLUB and WMH data sets, respectively.
IVApr 26, 2023
Tissue Classification During Needle Insertion Using Self-Supervised Contrastive Learning and Optical Coherence TomographyDebayan Bhattacharya, Sarah Latus, Finn Behrendt et al.
Needle positioning is essential for various medical applications such as epidural anaesthesia. Physicians rely on their instincts while navigating the needle in epidural spaces. Thereby, identifying the tissue structures may be helpful to the physician as they can provide additional feedback in the needle insertion process. To this end, we propose a deep neural network that classifies the tissues from the phase and intensity data of complex OCT signals acquired at the needle tip. We investigate the performance of the deep neural network in a limited labelled dataset scenario and propose a novel contrastive pretraining strategy that learns invariant representation for phase and intensity data. We show that with 10% of the training set, our proposed pretraining strategy helps the model achieve an F1 score of 0.84 whereas the model achieves an F1 score of 0.60 without it. Further, we analyse the importance of phase and intensity individually towards tissue classification.
IVDec 7, 2023Code
Guided Reconstruction with Conditioned Diffusion Models for Unsupervised Anomaly Detection in Brain MRIsFinn Behrendt, Debayan Bhattacharya, Robin Mieling et al.
The application of supervised models to clinical screening tasks is challenging due to the need for annotated data for each considered pathology. Unsupervised Anomaly Detection (UAD) is an alternative approach that aims to identify any anomaly as an outlier from a healthy training distribution. A prevalent strategy for UAD in brain MRI involves using generative models to learn the reconstruction of healthy brain anatomy for a given input image. As these models should fail to reconstruct unhealthy structures, the reconstruction errors indicate anomalies. However, a significant challenge is to balance the accurate reconstruction of healthy anatomy and the undesired replication of abnormal structures. While diffusion models have shown promising results with detailed and accurate reconstructions, they face challenges in preserving intensity characteristics, resulting in false positives. We propose conditioning the denoising process of diffusion models with additional information derived from a latent representation of the input image. We demonstrate that this conditioning allows for accurate and local adaptation to the general input intensity distribution while avoiding the replication of unhealthy structures. We compare the novel approach to different state-of-the-art methods and for different data sets. Our results show substantial improvements in the segmentation performance, with the Dice score improved by 11.9%, 20.0%, and 44.6%, for the BraTS, ATLAS and MSLUB data sets, respectively, while maintaining competitive performance on the WMH data set. Furthermore, our results indicate effective domain adaptation across different MRI acquisitions and simulated contrasts, an important attribute for general anomaly detection methods. The code for our work is available at https://github.com/FinnBehrendt/Conditioned-Diffusion-Models-UAD
LGOct 30, 2025
Enhancing ECG Classification Robustness with Lightweight Unsupervised Anomaly Detection FiltersMustafa Fuad Rifet Ibrahim, Maurice Meijer, Alexander Schlaefer et al.
Continuous electrocardiogram (ECG) monitoring via wearables offers significant potential for early cardiovascular disease (CVD) detection. However, deploying deep learning models for automated analysis in resource-constrained environments faces reliability challenges due to inevitable Out-of-Distribution (OOD) data. OOD inputs, such as unseen pathologies or noisecorrupted signals, often cause erroneous, high-confidence predictions by standard classifiers, compromising patient safety. Existing OOD detection methods either neglect computational constraints or address noise and unseen classes separately. This paper explores Unsupervised Anomaly Detection (UAD) as an independent, upstream filtering mechanism to improve robustness. We benchmark six UAD approaches, including Deep SVDD, reconstruction-based models, Masked Anomaly Detection, normalizing flows, and diffusion models, optimized via Neural Architecture Search (NAS) under strict resource constraints (at most 512k parameters). Evaluation on PTB-XL and BUT QDB datasets assessed detection of OOD CVD classes and signals unsuitable for analysis due to noise. Results show Deep SVDD consistently achieves the best trade-off between detection and efficiency. In a realistic deployment simulation, integrating the optimized Deep SVDD filter with a diagnostic classifier improved accuracy by up to 21 percentage points over a classifier-only baseline. This study demonstrates that optimized UAD filters can safeguard automated ECG analysis, enabling safer, more reliable continuous cardiovascular monitoring on wearables.
CVFeb 18, 2024Code
PolypNextLSTM: A lightweight and fast polyp video segmentation network using ConvNext and ConvLSTMDebayan Bhattacharya, Konrad Reuter, Finn Behrendt et al.
Commonly employed in polyp segmentation, single image UNet architectures lack the temporal insight clinicians gain from video data in diagnosing polyps. To mirror clinical practices more faithfully, our proposed solution, PolypNextLSTM, leverages video-based deep learning, harnessing temporal information for superior segmentation performance with the least parameter overhead, making it possibly suitable for edge devices. PolypNextLSTM employs a UNet-like structure with ConvNext-Tiny as its backbone, strategically omitting the last two layers to reduce parameter overhead. Our temporal fusion module, a Convolutional Long Short Term Memory (ConvLSTM), effectively exploits temporal features. Our primary novelty lies in PolypNextLSTM, which stands out as the leanest in parameters and the fastest model, surpassing the performance of five state-of-the-art image and video-based deep learning models. The evaluation of the SUN-SEG dataset spans easy-to-detect and hard-to-detect polyp scenarios, along with videos containing challenging artefacts like fast motion and occlusion. Comparison against 5 image-based and 5 video-based models demonstrates PolypNextLSTM's superiority, achieving a Dice score of 0.7898 on the hard-to-detect polyp test set, surpassing image-based PraNet (0.7519) and video-based PNSPlusNet (0.7486). Notably, our model excels in videos featuring complex artefacts such as ghosting and occlusion. PolypNextLSTM, integrating pruned ConvNext-Tiny with ConvLSTM for temporal fusion, not only exhibits superior segmentation performance but also maintains the highest frames per speed among evaluated models. Access code here https://github.com/mtec-tuhh/PolypNextLSTM
CVMar 23
6D Robotic OCT Scanning of Curved Tissue SurfacesSuresh Guttikonda, Maximilian Neidhardt, Vidas Raudonis et al.
Optical coherence tomography (OCT) is a non-invasive volumetric imaging modality with high spatial and temporal resolution. For imaging larger tissue structures, OCT probes need to be moved to scan the respective area. For handheld scanning, stitching of the acquired OCT volumes requires overlap to register the images. For robotic scanning and stitching, a typical approach is to restrict the motion to translations, as this avoids a full hand-eye calibration, which is complicated by the small field of view of most OCT probes. However, stitching by registration or by translational scanning are limited when curved tissue surfaces need to be scanned. We propose a marker for full six-dimensional hand-eye calibration of a robot mounted OCT probe. We show that the calibration results in highly repeatable estimates of the transformation. Moreover, we evaluate robotic scanning of two phantom surfaces to demonstrate that the proposed calibration allows for consistent scanning of large, curved tissue surfaces. As the proposed approach is not relying on image registration, it does not suffer from a potential accumulation of errors along a scan path. We also illustrate the improvement compared to conventional 3D-translational robotic scanning.
CVNov 10, 2025
Automated Estimation of Anatomical Risk Metrics for Endoscopic Sinus Surgery Using Deep LearningKonrad Reuter, Lennart Thaysen, Bilkay Doruk et al.
Endoscopic sinus surgery requires careful preoperative assessment of the skull base anatomy to minimize risks such as cerebrospinal fluid leakage. Anatomical risk scores like the Keros, Gera and Thailand-Malaysia-Singapore score offer a standardized approach but require time-consuming manual measurements on coronal CT or CBCT scans. We propose an automated deep learning pipeline that estimates these risk scores by localizing key anatomical landmarks via heatmap regression. We compare a direct approach to a specialized global-to-local learning strategy and find mean absolute errors on the relevant anatomical measurements of 0.506mm for the Keros, 4.516° for the Gera and 0.802mm / 0.777mm for the TMS classification.
IVApr 29, 2024Code
Self-supervised learning for classifying paranasal anomalies in the maxillary sinusDebayan Bhattacharya, Finn Behrendt, Benjamin Tobias Becker et al.
Purpose: Paranasal anomalies, frequently identified in routine radiological screenings, exhibit diverse morphological characteristics. Due to the diversity of anomalies, supervised learning methods require large labelled dataset exhibiting diverse anomaly morphology. Self-supervised learning (SSL) can be used to learn representations from unlabelled data. However, there are no SSL methods designed for the downstream task of classifying paranasal anomalies in the maxillary sinus (MS). Methods: Our approach uses a 3D Convolutional Autoencoder (CAE) trained in an unsupervised anomaly detection (UAD) framework. Initially, we train the 3D CAE to reduce reconstruction errors when reconstructing normal maxillary sinus (MS) image. Then, this CAE is applied to an unlabelled dataset to generate coarse anomaly locations by creating residual MS images. Following this, a 3D Convolutional Neural Network (CNN) reconstructs these residual images, which forms our SSL task. Lastly, we fine-tune the encoder part of the 3D CNN on a labelled dataset of normal and anomalous MS images. Results: The proposed SSL technique exhibits superior performance compared to existing generic self-supervised methods, especially in scenarios with limited annotated data. When trained on just 10% of the annotated dataset, our method achieves an Area Under the Precision-Recall Curve (AUPRC) of 0.79 for the downstream classification task. This performance surpasses other methods, with BYOL attaining an AUPRC of 0.75, SimSiam at 0.74, SimCLR at 0.73 and Masked Autoencoding using SparK at 0.75. Conclusion: A self-supervised learning approach that inherently focuses on localizing paranasal anomalies proves to be advantageous, particularly when the subsequent task involves differentiating normal from anomalous maxillary sinuses. Access our code at https://github.com/mtec-tuhh/self-supervised-paranasal-anomaly
IVFeb 2, 2022Code
Posterior temperature optimized Bayesian models for inverse problems in medical imagingMax-Heinrich Laves, Malte Tölle, Alexander Schlaefer et al.
We present Posterior Temperature Optimized Bayesian Inverse Models (POTOBIM), an unsupervised Bayesian approach to inverse problems in medical imaging using mean-field variational inference with a fully tempered posterior. Bayesian methods exhibit useful properties for approaching inverse tasks, such as tomographic reconstruction or image denoising. A suitable prior distribution introduces regularization, which is needed to solve the ill-posed problem and reduces overfitting the data. In practice, however, this often results in a suboptimal posterior temperature, and the full potential of the Bayesian approach is not being exploited. In POTOBIM, we optimize both the parameters of the prior distribution and the posterior temperature with respect to reconstruction accuracy using Bayesian optimization with Gaussian process regression. Our method is extensively evaluated on four different inverse tasks on a variety of modalities with images from public data sets and we demonstrate that an optimized posterior temperature outperforms both non-Bayesian and Bayesian approaches without temperature optimization. The use of an optimized prior distribution and posterior temperature leads to improved accuracy and uncertainty estimation and we show that it is sufficient to find these hyperparameters per task domain. Well-tempered posteriors yield calibrated uncertainty, which increases the reliability in the predictions. Our source code is publicly available at github.com/Cardio-AI/mfvi-dip-mia.
IVJan 4, 2024
Nodule detection and generation on chest X-rays: NODE21 ChallengeEcem Sogancioglu, Bram van Ginneken, Finn Behrendt et al.
Pulmonary nodules may be an early manifestation of lung cancer, the leading cause of cancer-related deaths among both men and women. Numerous studies have established that deep learning methods can yield high-performance levels in the detection of lung nodules in chest X-rays. However, the lack of gold-standard public datasets slows down the progression of the research and prevents benchmarking of methods for this task. To address this, we organized a public research challenge, NODE21, aimed at the detection and generation of lung nodules in chest X-rays. While the detection track assesses state-of-the-art nodule detection systems, the generation track determines the utility of nodule generation algorithms to augment training data and hence improve the performance of the detection systems. This paper summarizes the results of the NODE21 challenge and performs extensive additional experiments to examine the impact of the synthetically generated nodule training images on the detection algorithm performance.
IVMar 21, 2024
Diffusion Models with Ensembled Structure-Based Anomaly Scoring for Unsupervised Anomaly DetectionFinn Behrendt, Debayan Bhattacharya, Lennart Maack et al.
Supervised deep learning techniques show promise in medical image analysis. However, they require comprehensive annotated data sets, which poses challenges, particularly for rare diseases. Consequently, unsupervised anomaly detection (UAD) emerges as a viable alternative for pathology segmentation, as only healthy data is required for training. However, recent UAD anomaly scoring functions often focus on intensity only and neglect structural differences, which impedes the segmentation performance. This work investigates the potential of Structural Similarity (SSIM) to bridge this gap. SSIM captures both intensity and structural disparities and can be advantageous over the classical $l1$ error. However, we show that there is more than one optimal kernel size for the SSIM calculation for different pathologies. Therefore, we investigate an adaptive ensembling strategy for various kernel sizes to offer a more pathology-agnostic scoring mechanism. We demonstrate that this ensembling strategy can enhance the performance of DMs and mitigate the sensitivity to different kernel sizes across varying pathologies, highlighting its promise for brain MRI anomaly detection.
CVApr 1
An Approach to Enriching Surgical Video Datasets for Fine-Grained Spatial-Temporal Understanding of Vision-Language ModelsLennart Maack, Alexander Schlaefer
Surgical video understanding is a crucial prerequisite for advancing Computer-Assisted Surgery. While vision-language models (VLMs) have recently been applied to the surgical domain, existing surgical vision-language datasets lack in capturing and evaluating complex, interleaved spatial-temporal dynamics. Creating large scale datasets that accurately represent fine-grained spatial-temporal relationships in surgical videos is challenging due to costly manual annotations or error-prone generation using large language models. To address this gap, we introduce the SurgSTU-Pipeline, a deterministic generation pipeline featuring temporal and spatial continuity filtering to reliably create surgical datasets for fine-grained spatial-temporal multimodal understanding. Applying this pipeline to publicly available surgical datasets, we create the SurgSTU dataset, comprising 7515 video clips densely extended with 150k fine-grained spatial-temporal question-answer samples. Our comprehensive evaluation shows that while state-of-the-art generalist VLMs struggle in zero-shot settings, their spatial-temporal capabilities can be improved through in-context learning. A fine-tuned VLM on the SurgSTU training dataset achieves highest performance among all spatial-temporal tasks, validating the dataset's efficacy to improve spatial-temporal understanding of VLMs in surgical videos. Code will be made publicly available.
CVDec 5, 2025
Distilling Expert Surgical Knowledge: How to train local surgical VLMs for anatomy explanation in Complete Mesocolic ExcisionLennart Maack, Julia-Kristin Graß, Lisa-Marie Toscha et al.
Recently, Vision Large Language Models (VLMs) have demonstrated high potential in computer-aided diagnosis and decision-support. However, current VLMs show deficits in domain specific surgical scene understanding, such as identifying and explaining anatomical landmarks during Complete Mesocolic Excision. Additionally, there is a need for locally deployable models to avoid patient data leakage to large VLMs, hosted outside the clinic. We propose a privacy-preserving framework to distill knowledge from large, general-purpose LLMs into an efficient, local VLM. We generate an expert-supervised dataset by prompting a teacher LLM without sensitive images, using only textual context and binary segmentation masks for spatial information. This dataset is used for Supervised Fine-Tuning (SFT) and subsequent Direct Preference Optimization (DPO) of the locally deployable VLM. Our evaluation confirms that finetuning VLMs with our generated datasets increases surgical domain knowledge compared to its base VLM by a large margin. Overall, this work validates a data-efficient and privacy-conforming way to train a surgical domain optimized, locally deployable VLM for surgical scene understanding.
LGOct 21, 2025
Prototyping an End-to-End Multi-Modal Tiny-CNN for Cardiovascular Sensor PatchesMustafa Fuad Rifet Ibrahim, Tunc Alkanat, Maurice Meijer et al.
The vast majority of cardiovascular diseases may be preventable if early signs and risk factors are detected. Cardiovascular monitoring with body-worn sensor devices like sensor patches allows for the detection of such signs while preserving the freedom and comfort of patients. However, the analysis of the sensor data must be robust, reliable, efficient, and highly accurate. Deep learning methods can automate data interpretation, reducing the workload of clinicians. In this work, we analyze the feasibility of applying deep learning models to the classification of synchronized electrocardiogram (ECG) and phonocardiogram (PCG) recordings on resource-constrained medical edge devices. We propose a convolutional neural network with early fusion of data to solve a binary classification problem. We train and validate our model on the synchronized ECG and PCG recordings from the Physionet Challenge 2016 dataset. Our approach reduces memory footprint and compute cost by three orders of magnitude compared to the state-of-the-art while maintaining competitive accuracy. We demonstrate the applicability of our proposed model on medical edge devices by analyzing energy consumption on a microcontroller and an experimental sensor device setup, confirming that on-device inference can be more energy-efficient than continuous data streaming.
CVAug 11, 2025
Tracking Any Point Methods for Markerless 3D Tissue Tracking in Endoscopic Stereo ImagesKonrad Reuter, Suresh Guttikonda, Sarah Latus et al.
Minimally invasive surgery presents challenges such as dynamic tissue motion and a limited field of view. Accurate tissue tracking has the potential to support surgical guidance, improve safety by helping avoid damage to sensitive structures, and enable context-aware robotic assistance during complex procedures. In this work, we propose a novel method for markerless 3D tissue tracking by leveraging 2D Tracking Any Point (TAP) networks. Our method combines two CoTracker models, one for temporal tracking and one for stereo matching, to estimate 3D motion from stereo endoscopic images. We evaluate the system using a clinical laparoscopic setup and a robotic arm simulating tissue motion, with experiments conducted on a synthetic 3D-printed phantom and a chicken tissue phantom. Tracking on the chicken tissue phantom yielded more reliable results, with Euclidean distance errors as low as 1.1 mm at a velocity of 10 mm/s. These findings highlight the potential of TAP-based models for accurate, markerless 3D tracking in challenging surgical scenarios.
CVAug 7, 2025
Robust Tracking with Particle Filtering for Fluorescent Cardiac ImagingSuresh Guttikonda, Maximilian Neidhart, Johanna Sprenger et al.
Intraoperative fluorescent cardiac imaging enables quality control following coronary bypass grafting surgery. We can estimate local quantitative indicators, such as cardiac perfusion, by tracking local feature points. However, heart motion and significant fluctuations in image characteristics caused by vessel structural enrichment limit traditional tracking methods. We propose a particle filtering tracker based on cyclicconsistency checks to robustly track particles sampled to follow target landmarks. Our method tracks 117 targets simultaneously at 25.4 fps, allowing real-time estimates during interventions. It achieves a tracking error of (5.00 +/- 0.22 px) and outperforms other deep learning trackers (22.3 +/- 1.1 px) and conventional trackers (58.1 +/- 27.1 px).
IVAug 1, 2025
Do We Need Pre-Processing for Deep Learning Based Ultrasound Shear Wave Elastography?Sarah Grube, Sören Grünhagen, Sarah Latus et al.
Estimating the elasticity of soft tissue can provide useful information for various diagnostic applications. Ultrasound shear wave elastography offers a non-invasive approach. However, its generalizability and standardization across different systems and processing pipelines remain limited. Considering the influence of image processing on ultrasound based diagnostics, recent literature has discussed the impact of different image processing steps on reliable and reproducible elasticity analysis. In this work, we investigate the need of ultrasound pre-processing steps for deep learning-based ultrasound shear wave elastography. We evaluate the performance of a 3D convolutional neural network in predicting shear wave velocities from spatio-temporal ultrasound images, studying different degrees of pre-processing on the input images, ranging from fully beamformed and filtered ultrasound images to raw radiofrequency data. We compare the predictions from our deep learning approach to a conventional time-of-flight method across four gelatin phantoms with different elasticity levels. Our results demonstrate statistically significant differences in the predicted shear wave velocity among all elasticity groups, regardless of the degree of pre-processing. Although pre-processing slightly improves performance metrics, our results show that the deep learning approach can reliably differentiate between elasticity groups using raw, unprocessed radiofrequency data. These results show that deep learning-based approaches could reduce the need for and the bias of traditional ultrasound pre-processing steps in ultrasound shear wave elastography, enabling faster and more reliable clinical elasticity assessments.
ROFeb 20, 2025
A Mobile Robotic Approach to Autonomous Surface Scanning in Legal MedicineSarah Grube, Sarah Latus, Martin Fischer et al.
Purpose: Comprehensive legal medicine documentation includes both an internal but also an external examination of the corpse. Typically, this documentation is conducted manually during conventional autopsy. A systematic digital documentation would be desirable, especially for the external examination of wounds, which is becoming more relevant for legal medicine analysis. For this purpose, RGB surface scanning has been introduced. While a manual full surface scan using a handheld camera is timeconsuming and operator dependent, floor or ceiling mounted robotic systems require substantial space and a dedicated room. Hence, we consider whether a mobile robotic system can be used for external documentation. Methods: We develop a mobile robotic system that enables full-body RGB-D surface scanning. Our work includes a detailed configuration space analysis to identify the environmental parameters that need to be considered to successfully perform a surface scan. We validate our findings through an experimental study in the lab and demonstrate the system's application in a legal medicine environment. Results: Our configuration space analysis shows that a good trade-off between coverage and time is reached with three robot base positions, leading to a coverage of 94.96 %. Experiments validate the effectiveness of the system in accurately capturing body surface geometry with an average surface coverage of 96.90 +- 3.16 % and 92.45 +- 1.43 % for a body phantom and actual corpses, respectively. Conclusion: This work demonstrates the potential of a mobile robotic system to automate RGB-D surface scanning in legal medicine, complementing the use of post-mortem CT scans for inner documentation. Our results indicate that the proposed system can contribute to more efficient and autonomous legal medicine documentation, reducing the need for manual intervention.
IVJan 31, 2022
Unsupervised Anomaly Detection in 3D Brain MRI using Deep Learning with Multi-Task Brain Age PredictionMarcel Bengs, Finn Behrendt, Max-Heinrich Laves et al.
Lesion detection in brain Magnetic Resonance Images (MRIs) remains a challenging task. MRIs are typically read and interpreted by domain experts, which is a tedious and time-consuming process. Recently, unsupervised anomaly detection (UAD) in brain MRI with deep learning has shown promising results to provide a quick, initial assessment. So far, these methods only rely on the visual appearance of healthy brain anatomy for anomaly detection. Another biomarker for abnormal brain development is the deviation between the brain age and the chronological age, which is unexplored in combination with UAD. We propose deep learning for UAD in 3D brain MRI considering additional age information. We analyze the value of age information during training, as an additional anomaly score, and systematically study several architecture concepts. Based on our analysis, we propose a novel deep learning approach for UAD with multi-task age prediction. We use clinical T1-weighted MRIs of 1735 healthy subjects and the publicly available BraTs 2019 data set for our study. Our novel approach significantly improves UAD performance with an AUC of 92.60% compared to an AUC-score of 84.37% using previous approaches without age information.
ROJan 28, 2022
Robotic Tissue Sampling for Safe Post-mortem Biopsy in Infectious CorpsesMaximilian Neidhardt, Stefan Gerlach, Robin Mieling et al.
In pathology and legal medicine, the histopathological and microbiological analysis of tissue samples from infected deceased is a valuable information for developing treatment strategies during a pandemic such as COVID-19. However, a conventional autopsy carries the risk of disease transmission and may be rejected by relatives. We propose minimally invasive biopsy with robot assistance under CT guidance to minimize the risk of disease transmission during tissue sampling and to improve accuracy. A flexible robotic system for biopsy sampling is presented, which is applied to human corpses placed inside protective body bags. An automatic planning and decision system estimates optimal insertion point. Heat maps projected onto the segmented skin visualize the distance and angle of insertions and estimate the minimum cost of a puncture while avoiding bone collisions. Further, we test multiple insertion paths concerning feasibility and collisions. A custom end effector is designed for inserting needles and extracting tissue samples under robotic guidance. Our robotic post-mortem biopsy (RPMB) system is evaluated in a study during the COVID-19 pandemic on 20 corpses and 10 tissue targets, 5 of them being infected with SARS-CoV-2. The mean planning time including robot path planning is (5.72+-1.67) s. Mean needle placement accuracy is (7.19+-4.22) mm.
IVOct 17, 2021
Self-Supervised U-Net for Segmenting Flat and Sessile PolypsDebayan Bhattacharya, Christian Betz, Dennis Eggert et al.
Colorectal Cancer(CRC) poses a great risk to public health. It is the third most common cause of cancer in the US. Development of colorectal polyps is one of the earliest signs of cancer. Early detection and resection of polyps can greatly increase survival rate to 90%. Manual inspection can cause misdetections because polyps vary in color, shape, size and appearance. To this end, Computer-Aided Diagnosis systems(CADx) has been proposed that detect polyps by processing the colonoscopic videos. The system acts a secondary check to help clinicians reduce misdetections so that polyps may be resected before they transform to cancer. Polyps vary in color, shape, size, texture and appearance. As a result, the miss rate of polyps is between 6% and 27% despite the prominence of CADx solutions. Furthermore, sessile and flat polyps which have diameter less than 10 mm are more likely to be undetected. Convolutional Neural Networks(CNN) have shown promising results in polyp segmentation. However, all of these works have a supervised approach and are limited by the size of the dataset. It was observed that smaller datasets reduce the segmentation accuracy of ResUNet++. We train a U-Net to inpaint randomly dropped out pixels in the image as a proxy task. The dataset we use for pre-training is Kvasir-SEG dataset. This is followed by a supervised training on the limited Kvasir-Sessile dataset. Our experimental results demonstrate that with limited annotated dataset and a larger unlabeled dataset, self-supervised approach is a better alternative than fully supervised approach. Specifically, our self-supervised U-Net performs better than five segmentation models which were trained in supervised manner on the Kvasir-Sessile dataset.
IVSep 20, 2021
A novel optical needle probe for deep learning-based tissue elasticity characterizationRobin Mieling, Johanna Sprenger, Sarah Latus et al.
The distinction between malignant and benign tumors is essential to the treatment of cancer. The tissue's elasticity can be used as an indicator for the required tissue characterization. Optical coherence elastography (OCE) probes have been proposed for needle insertions but have so far lacked the necessary load sensing capabilities. We present a novel OCE needle probe that provides simultaneous optical coherence tomography (OCT) imaging and load sensing at the needle tip. We demonstrate the application of the needle probe in indentation experiments on gelatin phantoms with varying gelatin concentrations. We further implement two deep learning methods for the end-to-end sample characterization from the acquired OCT data. We report the estimation of gelatin sample concentrations in unseen samples with a mean error of $1.21 \pm 0.91$ wt\%. Both evaluated deep learning models successfully provide sample characterization with different advantages regarding the accuracy and inference time.
IVSep 14, 2021
Multi-Scale Input Strategies for Medulloblastoma Tumor Classification using Deep Transfer LearningMarcel Bengs, Satish Pant, Michael Bockmayr et al.
Medulloblastoma (MB) is a primary central nervous system tumor and the most common malignant brain cancer among children. Neuropathologists perform microscopic inspection of histopathological tissue slides under a microscope to assess the severity of the tumor. This is a time-consuming task and often infused with observer variability. Recently, pre-trained convolutional neural networks (CNN) have shown promising results for MB subtype classification. Typically, high-resolution images are divided into smaller tiles for classification, while the size of the tiles has not been systematically evaluated. We study the impact of tile size and input strategy and classify the two major histopathological subtypes-Classic and Demoplastic/Nodular. To this end, we use recently proposed EfficientNets and evaluate tiles with increasing size combined with various downsampling scales. Our results demonstrate using large input tiles pixels followed by intermediate downsampling and patch cropping significantly improves MB classification performance. Our top-performing method achieves the AUC-ROC value of 90.90\% compared to 84.53\% using the previous approach with smaller input tiles.
IVSep 14, 2021
3-Dimensional Deep Learning with Spatial Erasing for Unsupervised Anomaly Segmentation in Brain MRIMarcel Bengs, Finn Behrendt, Julia Krüger et al.
Purpose. Brain Magnetic Resonance Images (MRIs) are essential for the diagnosis of neurological diseases. Recently, deep learning methods for unsupervised anomaly detection (UAD) have been proposed for the analysis of brain MRI. These methods rely on healthy brain MRIs and eliminate the requirement of pixel-wise annotated data compared to supervised deep learning. While a wide range of methods for UAD have been proposed, these methods are mostly 2D and only learn from MRI slices, disregarding that brain lesions are inherently 3D and the spatial context of MRI volumes remains unexploited. Methods. We investigate whether using increased spatial context by using MRI volumes combined with spatial erasing leads to improved unsupervised anomaly segmentation performance compared to learning from slices. We evaluate and compare 2D variational autoencoder (VAE) to their 3D counterpart, propose 3D input erasing, and systemically study the impact of the data set size on the performance. Results. Using two publicly available segmentation data sets for evaluation, 3D VAE outperform their 2D counterpart, highlighting the advantage of volumetric context. Also, our 3D erasing methods allow for further performance improvements. Our best performing 3D VAE with input erasing leads to an average DICE score of 31.40% compared to 25.76% for the 2D VAE. Conclusions. We propose 3D deep learning methods for UAD in brain MRI combined with 3D erasing and demonstrate that 3D methods clearly outperform their 2D counterpart for anomaly segmentation. Also, our spatial erasing method allows for further performance improvements and reduces the requirement for large data sets.
IVSep 10, 2021
Medulloblastoma Tumor Classification using Deep Transfer Learning with Multi-Scale EfficientNetsMarcel Bengs, Michael Bockmayr, Ulrich Schüller et al.
Medulloblastoma (MB) is the most common malignant brain tumor in childhood. The diagnosis is generally based on the microscopic evaluation of histopathological tissue slides. However, visual-only assessment of histopathological patterns is a tedious and time-consuming task and is also affected by observer variability. Hence, automated MB tumor classification could assist pathologists by promoting consistency and robust quantification. Recently, convolutional neural networks (CNNs) have been proposed for this task, while transfer learning has shown promising results. In this work, we propose an end-to-end MB tumor classification and explore transfer learning with various input sizes and matching network dimensions. We focus on differentiating between the histological subtypes classic and desmoplastic/nodular. For this purpose, we systematically evaluate recently proposed EfficientNets, which uniformly scale all dimensions of a CNN. Using a data set with 161 cases, we demonstrate that pre-trained EfficientNets with larger input resolutions lead to significant performance improvements compared to commonly used pre-trained CNN architectures. Also, we highlight the importance of transfer learning, when using such large architectures. Overall, our best performing method achieves an F1-Score of 80.1%.
IVJun 11, 2021
Posterior Temperature Optimization in Variational Inference for Inverse ProblemsMax-Heinrich Laves, Malte Tölle, Alexander Schlaefer et al.
Bayesian methods feature useful properties for solving inverse problems, such as tomographic reconstruction. The prior distribution introduces regularization, which helps solving the ill-posed problem and reduces overfitting. In practice, this often results in a suboptimal posterior temperature and the full potential of the Bayesian approach is not realized. In this paper, we optimize both the parameters of the prior distribution and the posterior temperature using Bayesian optimization. Well-tempered posteriors lead to better predictive performance and improved uncertainty calibration, which we demonstrate for the task of sparse-view CT reconstruction.
IVAug 5, 2020
Multiple Sclerosis Lesion Activity Segmentation with Attention-Guided Two-Path CNNsNils Gessert, Julia Krüger, Roland Opfer et al.
Multiple sclerosis is an inflammatory autoimmune demyelinating disease that is characterized by lesions in the central nervous system. Typically, magnetic resonance imaging (MRI) is used for tracking disease progression. Automatic image processing methods can be used to segment lesions and derive quantitative lesion parameters. So far, methods have focused on lesion segmentation for individual MRI scans. However, for monitoring disease progression, \textit{lesion activity} in terms of new and enlarging lesions between two time points is a crucial biomarker. For this problem, several classic methods have been proposed, e.g., using difference volumes. Despite their success for single-volume lesion segmentation, deep learning approaches are still rare for lesion activity segmentation. In this work, convolutional neural networks (CNNs) are studied for lesion activity segmentation from two time points. For this task, CNNs are designed and evaluated that combine the information from two points in different ways. In particular, two-path architectures with attention-guided interactions are proposed that enable effective information exchange between the two time point's processing paths. It is demonstrated that deep learning-based methods outperform classic approaches and it is shown that attention-guided interactions significantly improve performance. Furthermore, the attention modules produce plausible attention maps that have a masking effect that suppresses old, irrelevant lesions. A lesion-wise false positive rate of 26.4% is achieved at a true positive rate of 74.2%, which is not significantly different from the interrater performance.
IVJul 2, 2020
4D Spatio-Temporal Convolutional Networks for Object Position Estimation in OCT VolumesMarcel Bengs, Nils Gessert, Alexander Schlaefer
Tracking and localizing objects is a central problem in computer-assisted surgery. Optical coherence tomography (OCT) can be employed as an optical tracking system, due to its high spatial and temporal resolution. Recently, 3D convolutional neural networks (CNNs) have shown promising performance for pose estimation of a marker object using single volumetric OCT images. While this approach relied on spatial information only, OCT allows for a temporal stream of OCT image volumes capturing the motion of an object at high volumes rates. In this work, we systematically extend 3D CNNs to 4D spatio-temporal CNNs to evaluate the impact of additional temporal information for marker object tracking. Across various architectures, our results demonstrate that using a stream of OCT volumes and employing 4D spatio-temporal convolutions leads to a 30% lower mean absolute error compared to single volume processing with 3D CNNs.
IVJul 2, 2020
Spectral-Spatial Recurrent-Convolutional Networks for In-Vivo Hyperspectral Tumor Type ClassificationMarcel Bengs, Nils Gessert, Wiebke Laffers et al.
Early detection of cancerous tissue is crucial for long-term patient survival. In the head and neck region, a typical diagnostic procedure is an endoscopic intervention where a medical expert manually assesses tissue using RGB camera images. While healthy and tumor regions are generally easier to distinguish, differentiating benign and malignant tumors is very challenging. This requires an invasive biopsy, followed by histological evaluation for diagnosis. Also, during tumor resection, tumor margins need to be verified by histological analysis. To avoid unnecessary tissue resection, a non-invasive, image-based diagnostic tool would be very valuable. Recently, hyperspectral imaging paired with deep learning has been proposed for this task, demonstrating promising results on ex-vivo specimens. In this work, we demonstrate the feasibility of in-vivo tumor type classification using hyperspectral imaging and deep learning. We analyze the value of using multiple hyperspectral bands compared to conventional RGB images and we study several machine learning models' ability to make use of the additional spectral information. Based on our insights, we address spectral and spatial processing using recurrent-convolutional models for effective spectral aggregating and spatial feature learning. Our best model achieves an AUC of 76.3%, significantly outperforming previous conventional and deep learning methods.
CVMay 20, 2020
Deep learning with 4D spatio-temporal data representations for OCT-based force estimationNils Gessert, Marcel Bengs, Matthias Schlüter et al.
Estimating the forces acting between instruments and tissue is a challenging problem for robot-assisted minimally-invasive surgery. Recently, numerous vision-based methods have been proposed to replace electro-mechanical approaches. Moreover, optical coherence tomography (OCT) and deep learning have been used for estimating forces based on deformation observed in volumetric image data. The method demonstrated the advantage of deep learning with 3D volumetric data over 2D depth images for force estimation. In this work, we extend the problem of deep learning-based force estimation to 4D spatio-temporal data with streams of 3D OCT volumes. For this purpose, we design and evaluate several methods extending spatio-temporal deep learning to 4D which is largely unexplored so far. Furthermore, we provide an in-depth analysis of multi-dimensional image data representations for force estimation, comparing our 4D approach to previous, lower-dimensional methods. Also, we analyze the effect of temporal information and we study the prediction of short-term future force values, which could facilitate safety features. For our 4D force estimation architectures, we find that efficient decoupling of spatial and temporal processing is advantageous. We show that using 4D spatio-temporal data outperforms all previously used data representations with a mean absolute error of 10.7mN. We find that temporal information is valuable for force estimation and we demonstrate the feasibility of force prediction.
IVApr 21, 2020
4D Spatio-Temporal Deep Learning with 4D fMRI Data for Autism Spectrum Disorder ClassificationMarcel Bengs, Nils Gessert, Alexander Schlaefer
Autism spectrum disorder (ASD) is associated with behavioral and communication problems. Often, functional magnetic resonance imaging (fMRI) is used to detect and characterize brain changes related to the disorder. Recently, machine learning methods have been employed to reveal new patterns by trying to classify ASD from spatio-temporal fMRI images. Typically, these methods have either focused on temporal or spatial information processing. Instead, we propose a 4D spatio-temporal deep learning approach for ASD classification where we jointly learn from spatial and temporal data. We employ 4D convolutional neural networks and convolutional-recurrent models which outperform a previous approach with an F1-score of 0.71 compared to an F1-score of 0.65.
IVApr 21, 2020
Spatio-spectral deep learning methods for in-vivo hyperspectral laryngeal cancer detectionMarcel Bengs, Stephan Westermann, Nils Gessert et al.
Early detection of head and neck tumors is crucial for patient survival. Often, diagnoses are made based on endoscopic examination of the larynx followed by biopsy and histological analysis, leading to a high inter-observer variability due to subjective assessment. In this regard, early non-invasive diagnostics independent of the clinician would be a valuable tool. A recent study has shown that hyperspectral imaging (HSI) can be used for non-invasive detection of head and neck tumors, as precancerous or cancerous lesions show specific spectral signatures that distinguish them from healthy tissue. However, HSI data processing is challenging due to high spectral variations, various image interferences, and the high dimensionality of the data. Therefore, performance of automatic HSI analysis has been limited and so far, mostly ex-vivo studies have been presented with deep learning. In this work, we analyze deep learning techniques for in-vivo hyperspectral laryngeal cancer detection. For this purpose we design and evaluate convolutional neural networks (CNNs) with 2D spatial or 3D spatio-spectral convolutions combined with a state-of-the-art Densenet architecture. For evaluation, we use an in-vivo data set with HSI of the oral cavity or oropharynx. Overall, we present multiple deep learning techniques for in-vivo laryngeal cancer detection based on HSI and we show that jointly learning from the spatial and spectral domain improves classification accuracy notably. Our 3D spatio-spectral Densenet achieves an average accuracy of 81%.
IVApr 21, 2020
A Deep Learning Approach for Motion Forecasting Using 4D OCT DataMarcel Bengs, Nils Gessert, Alexander Schlaefer
Forecasting motion of a specific target object is a common problem for surgical interventions, e.g. for localization of a target region, guidance for surgical interventions, or motion compensation. Optical coherence tomography (OCT) is an imaging modality with a high spatial and temporal resolution. Recently, deep learning methods have shown promising performance for OCT-based motion estimation based on two volumetric images. We extend this approach and investigate whether using a time series of volumes enables motion forecasting. We propose 4D spatio-temporal deep learning for end-to-end motion forecasting and estimation using a stream of OCT volumes. We design and evaluate five different 3D and 4D deep learning methods using a tissue data set. Our best performing 4D method achieves motion forecasting with an overall average correlation coefficient of 97.41%, while also improving motion estimation performance by a factor of 2.5 compared to a previous 3D approach.
IVApr 21, 2020
Spatio-Temporal Deep Learning Methods for Motion Estimation Using 4D OCT Image DataMarcel Bengs, Nils Gessert, Matthias Schlüter et al.
Purpose. Localizing structures and estimating the motion of a specific target region are common problems for navigation during surgical interventions. Optical coherence tomography (OCT) is an imaging modality with a high spatial and temporal resolution that has been used for intraoperative imaging and also for motion estimation, for example, in the context of ophthalmic surgery or cochleostomy. Recently, motion estimation between a template and a moving OCT image has been studied with deep learning methods to overcome the shortcomings of conventional, feature-based methods. Methods. We investigate whether using a temporal stream of OCT image volumes can improve deep learning-based motion estimation performance. For this purpose, we design and evaluate several 3D and 4D deep learning methods and we propose a new deep learning approach. Also, we propose a temporal regularization strategy at the model output. Results. Using a tissue dataset without additional markers, our deep learning methods using 4D data outperform previous approaches. The best performing 4D architecture achieves an correlation coefficient (aCC) of 98.58% compared to 85.0% of a previous 3D deep learning method. Also, our temporal regularization strategy at the output further improves 4D model performance to an aCC of 99.06%. In particular, our 4D method works well for larger motion and is robust towards image rotations and motion distortions. Conclusions. We propose 4D spatio-temporal deep learning for OCT-based motion estimation. On a tissue dataset, we find that using 4D information for the model input improves performance while maintaining reasonable inference times. Our regularization strategy demonstrates that additional temporal information is also beneficial at the model output.
CVApr 20, 2020
4D Deep Learning for Multiple Sclerosis Lesion Activity SegmentationNils Gessert, Marcel Bengs, Julia Krüger et al.
Multiple sclerosis lesion activity segmentation is the task of detecting new and enlarging lesions that appeared between a baseline and a follow-up brain MRI scan. While deep learning methods for single-scan lesion segmentation are common, deep learning approaches for lesion activity have only been proposed recently. Here, a two-path architecture processes two 3D MRI volumes from two time points. In this work, we investigate whether extending this problem to full 4D deep learning using a history of MRI volumes and thus an extended baseline can improve performance. For this purpose, we design a recurrent multi-encoder-decoder architecture for processing 4D data. We find that adding more temporal information is beneficial and our proposed architecture outperforms previous approaches with a lesion-wise true positive rate of 0.84 at a lesion-wise false positive rate of 0.19.
CVJan 25, 2020
Learning Preference-Based Similarities from Face Images using Siamese Multi-Task CNNsNils Gessert, Alexander Schlaefer
Online dating has become a common occurrence over the last few decades. A key challenge for online dating platforms is to determine suitable matches for their users. A lot of dating services rely on self-reported user traits and preferences for matching. At the same time, some services largely rely on user images and thus initial visual preference. Especially for the latter approach, previous research has attempted to capture users' visual preferences for automatic match recommendation. These approaches are mostly based on the assumption that physical attraction is the key factor for relationship formation and personal preferences, interests, and attitude are largely neglected. Deep learning approaches have shown that a variety of properties can be predicted from human faces to some degree, including age, health and even personality traits. Therefore, we investigate the feasibility of bridging image-based matching and matching with personal interests, preferences, and attitude. We approach the problem in a supervised manner by predicting similarity scores between two users based on images of their faces only. The ground-truth for the similarity matching scores is determined by a test that aims to capture users' preferences, interests, and attitude that are relevant for forming romantic relationships. The images are processed by a Siamese Multi-Task deep learning architecture. We find a statistically significant correlation between predicted and target similarity scores. Thus, our results indicate that learning similarities in terms of interests, preferences, and attitude from face images appears to be feasible to some degree.
CVNov 6, 2019
Melanoma detection with electrical impedance spectroscopy and dermoscopy using joint deep learning modelsNils Gessert, Marcel Bengs, Alexander Schlaefer
The initial assessment of skin lesions is typically based on dermoscopic images. As this is a difficult and time-consuming task, machine learning methods using dermoscopic images have been proposed to assist human experts. Other approaches have studied electrical impedance spectroscopy (EIS) as a basis for clinical decision support systems. Both methods represent different ways of measuring skin lesion properties as dermoscopy relies on visible light and EIS uses electric currents. Thus, the two methods might carry complementary features for lesion classification. Therefore, we propose joint deep learning models considering both EIS and dermoscopy for melanoma detection. For this purpose, we first study machine learning methods for EIS that incorporate domain knowledge and previously used heuristics into the design process. As a result, we propose a recurrent model with state-max-pooling which automatically learns the relevance of different EIS measurements. Second, we combine this new model with different convolutional neural networks that process dermoscopic images. We study ensembling approaches and also propose a cross-attention module guiding information exchange between the EIS and dermoscopy model. In general, combinations of EIS and dermoscopy clearly outperform models that only use either EIS or dermoscopy. We show that our attention-based, combined model outperforms other models with specificities of 34.4% (CI 31.3-38.4), 34.7% (CI 31.0-38.8) and 53.7% (CI 50.1-57.6) for dermoscopy, EIS and the combined model, respectively, at a clinically relevant sensitivity of 98%.
CVOct 9, 2019
Skin Lesion Classification Using Ensembles of Multi-Resolution EfficientNets with Meta DataNils Gessert, Maximilian Nielsen, Mohsin Shaikh et al.
In this paper, we describe our method for the ISIC 2019 Skin Lesion Classification Challenge. The challenge comes with two tasks. For task 1, skin lesions have to be classified based on dermoscopic images. For task 2, dermoscopic images and additional patient meta data have to be used. A diverse dataset of 25000 images was provided for training, containing images from eight classes. The final test set contains an additional, unknown class. We address this challenging problem with a simple, data driven approach by including external data with skin lesions types that are not present in the training set. Furthermore, multi-class skin lesion classification comes with the problem of severe class imbalance. We try to overcome this problem by using loss balancing. Also, the dataset contains images with very different resolutions. We take care of this property by considering different model input resolutions and different cropping strategies. To incorporate meta data such as age, anatomical site, and sex, we use an additional dense neural network and fuse its features with the CNN. We aggregate all our models with an ensembling strategy where we search for the optimal subset of models. Our best ensemble achieves a balanced accuracy of 74.2% using five-fold cross-validation. On the official test set our method is ranked first for both tasks with a balanced accuracy of 63.6% for task 1 and 63.4% for task 2.
CVAug 12, 2019
Towards Deep Learning-Based EEG Electrode Detection Using Automatically Generated LabelsNils Gessert, Martin Gromniak, Marcel Bengs et al.
Electroencephalography (EEG) allows for source measurement of electrical brain activity. Particularly for inverse localization, the electrode positions on the scalp need to be known. Often, systems such as optical digitizing scanners are used for accurate localization with a stylus. However, the approach is time-consuming as each electrode needs to be scanned manually and the scanning systems are expensive. We propose using an RGBD camera to directly track electrodes in the images using deep learning methods. Studying and evaluating deep learning methods requires large amounts of labeled data. To overcome the time-consuming data annotation, we generate a large number of ground-truth labels using a robotic setup. We demonstrate that deep learning-based electrode detection is feasible with a mean absolute error of 5.69 +- 6.1mm and that our annotation scheme provides a useful environment for studying deep learning methods for electrode detection.
IVAug 12, 2019
Left Ventricle Quantification Using Direct Regression with Segmentation Regularization and Ensembles of Pretrained 2D and 3D CNNsNils Gessert, Alexander Schlaefer
Cardiac left ventricle (LV) quantification provides a tool for diagnosing cardiac diseases. Automatic calculation of all relevant LV indices from cardiac MR images is an intricate task due to large variations among patients and deformation during the cardiac cycle. Typical methods are based on segmentation of the myocardium or direct regression from MR images. To consider cardiac motion and deformation, recurrent neural networks and spatio-temporal convolutional neural networks (CNNs) have been proposed. We study an approach combining state-of-the-art models and emphasizing transfer learning to account for the small dataset provided for the LVQuan19 challenge. We compare 2D spatial and 3D spatio-temporal CNNs for LV indices regression and cardiac phase classification. To incorporate segmentation information, we propose an architecture-independent segmentation-based regularization. To improve the robustness further, we employ a search scheme that identifies the optimal ensemble from a set of architecture variants. Evaluating on the LVQuan19 Challenge training dataset with 5-fold cross-validation, we achieve mean absolute errors of 111 +- 76mm^2, 1.84 +- 0.9mm and 1.22 +- 0.6mm for area, dimension and regional wall thickness regression, respectively. The error rate for cardiac phase classification is 6.7%.
IVMay 22, 2019
Spatio-Temporal Deep Learning Models for Tip Force Estimation During Needle InsertionNils Gessert, Torben Priegnitz, Thore Saathoff et al.
Purpose. Precise placement of needles is a challenge in a number of clinical applications such as brachytherapy or biopsy. Forces acting at the needle cause tissue deformation and needle deflection which in turn may lead to misplacement or injury. Hence, a number of approaches to estimate the forces at the needle have been proposed. Yet, integrating sensors into the needle tip is challenging and a careful calibration is required to obtain good force estimates. Methods. We describe a fiber-optical needle tip force sensor design using a single OCT fiber for measurement. The fiber images the deformation of an epoxy layer placed below the needle tip which results in a stream of 1D depth profiles. We study different deep learning approaches to facilitate calibration between this spatio-temporal image data and the related forces. In particular, we propose a novel convGRU-CNN architecture for simultaneous spatial and temporal data processing. Results. The needle can be adapted to different operating ranges by changing the stiffness of the epoxy layer. Likewise, calibration can be adapted by training the deep learning models. Our novel convGRU-CNN architecture results in the lowest mean absolute error of 1.59 +- 1.3 mN and a cross-correlation coefficient of 0.9997, and clearly outperforms the other methods. Ex vivo experiments in human prostate tissue demonstrate the needle's application. Conclusions. Our OCT-based fiber-optical sensor presents a viable alternative for needle tip force estimation. The results indicate that the rich spatio-temporal information included in the stream of images showing the deformation throughout the epoxy layer can be effectively used by deep learning models. Particularly, we demonstrate that the convGRU-CNN architecture performs favorably, making it a promising approach for other spatio-temporal learning problems.
CVMay 20, 2019
Deep Transfer Learning Methods for Colon Cancer Classification in Confocal Laser Microscopy ImagesNils Gessert, Marcel Bengs, Lukas Wittig et al.
Purpose: The gold standard for colorectal cancer metastases detection in the peritoneum is histological evaluation of a removed tissue sample. For feedback during interventions, real-time in-vivo imaging with confocal laser microscopy has been proposed for differentiation of benign and malignant tissue by manual expert evaluation. Automatic image classification could improve the surgical workflow further by providing immediate feedback. Methods: We analyze the feasibility of classifying tissue from confocal laser microscopy in the colon and peritoneum. For this purpose, we adopt both classical and state-of-the-art convolutional neural networks to directly learn from the images. As the available dataset is small, we investigate several transfer learning strategies including partial freezing variants and full fine-tuning. We address the distinction of different tissue types, as well as benign and malignant tissue. Results: We present a thorough analysis of transfer learning strategies for colorectal cancer with confocal laser microscopy. In the peritoneum, metastases are classified with an AUC of 97.1 and in the colon, the primarius is classified with an AUC of 73.1. In general, transfer learning substantially improves performance over training from scratch. We find that the optimal transfer learning strategy differs for models and classification tasks. Conclusions: We demonstrate that convolutional neural networks and transfer learning can be used to identify cancer tissue with confocal laser microscopy. We show that there is no generally optimal transfer learning strategy and model as well as task-specific engineering is required. Given the high performance for the peritoneum, even with a small dataset, application for intraoperative decision support could be feasible.