CVApr 14, 2023
Domain shifts in dermoscopic skin cancer datasets: Evaluation of essential limitations for clinical translationKatharina Fogelberg, Sireesha Chamarthi, Roman C. Maron et al.
The limited ability of Convolutional Neural Networks to generalize to images from previously unseen domains is a major limitation, in particular, for safety-critical clinical tasks such as dermoscopic skin cancer classification. In order to translate CNN-based applications into the clinic, it is essential that they are able to adapt to domain shifts. Such new conditions can arise through the use of different image acquisition systems or varying lighting conditions. In dermoscopy, shifts can also occur as a change in patient age or occurence of rare lesion localizations (e.g. palms). These are not prominently represented in most training datasets and can therefore lead to a decrease in performance. In order to verify the generalizability of classification models in real world clinical settings it is crucial to have access to data which mimics such domain shifts. To our knowledge no dermoscopic image dataset exists where such domain shifts are properly described and quantified. We therefore grouped publicly available images from ISIC archive based on their metadata (e.g. acquisition location, lesion localization, patient age) to generate meaningful domains. To verify that these domains are in fact distinct, we used multiple quantification measures to estimate the presence and intensity of domain shifts. Additionally, we analyzed the performance on these domains with and without an unsupervised domain adaptation technique. We observed that in most of our grouped domains, domain shifts in fact exist. Based on our results, we believe these datasets to be helpful for testing the generalization capabilities of dermoscopic skin cancer classifiers.
LGDec 21, 2022
Is it worth it? Comparing six deep and classical methods for unsupervised anomaly detection in time seriesFerdinand Rewicki, Joachim Denzler, Julia Niebling
Detecting anomalies in time series data is important in a variety of fields, including system monitoring, healthcare, and cybersecurity. While the abundance of available methods makes it difficult to choose the most appropriate method for a given application, each method has its strengths in detecting certain types of anomalies. In this study, we compare six unsupervised anomaly detection methods of varying complexity to determine whether more complex methods generally perform better and if certain methods are better suited to certain types of anomalies. We evaluated the methods using the UCR anomaly archive, a recent benchmark dataset for anomaly detection. We analyzed the results on a dataset and anomaly type level after adjusting the necessary hyperparameters for each method. Additionally, we assessed the ability of each method to incorporate prior knowledge about anomalies and examined the differences between point-wise and sequence-wise features. Our experiments show that classical machine learning methods generally outperform deep learning methods across a range of anomaly types.
LGAug 23, 2024
Functional Tensor Decompositions for Physics-Informed Neural NetworksSai Karthikeya Vemuri, Tim Büchner, Julia Niebling et al.
Physics-Informed Neural Networks (PINNs) have shown continuous and increasing promise in approximating partial differential equations (PDEs), although they remain constrained by the curse of dimensionality. In this paper, we propose a generalized PINN version of the classical variable separable method. To do this, we first show that, using the universal approximation theorem, a multivariate function can be approximated by the outer product of neural networks, whose inputs are separated variables. We leverage tensor decomposition forms to separate the variables in a PINN setting. By employing Canonic Polyadic (CP), Tensor-Train (TT), and Tucker decomposition forms within the PINN framework, we create robust architectures for learning multivariate functions from separate neural networks connected by outer products. Our methodology significantly enhances the performance of PINNs, as evidenced by improved results on complex high-dimensional PDEs, including the 3d Helmholtz and 5d Poisson equations, among others. This research underscores the potential of tensor decomposition-based variably separated PINNs to surpass the state-of-the-art, offering a compelling solution to the dimensionality challenge in PDE approximation.
CVOct 5, 2023
Mitigating the Influence of Domain Shift in Skin Lesion Classification: A Benchmark Study of Unsupervised Domain Adaptation Methods on Dermoscopic ImagesSireesha Chamarthi, Katharina Fogelberg, Roman C. Maron et al.
The potential of deep neural networks in skin lesion classification has already been demonstrated to be on-par if not superior to the dermatologists diagnosis. However, the performance of these models usually deteriorates when the test data differs significantly from the training data (i.e. domain shift). This concerning limitation for models intended to be used in real-world skin lesion classification tasks poses a risk to patients. For example, different image acquisition systems or previously unseen anatomical sites on the patient can suffice to cause such domain shifts. Mitigating the negative effect of such shifts is therefore crucial, but developing effective methods to address domain shift has proven to be challenging. In this study, we carry out an in-depth analysis of eight different unsupervised domain adaptation methods to analyze their effectiveness in improving generalization for dermoscopic datasets. To ensure robustness of our findings, we test each method on a total of ten distinct datasets, thereby covering a variety of possible domain shifts. In addition, we investigated which factors in the domain shifted datasets have an impact on the effectiveness of domain adaptation methods. Our findings show that all of the eight domain adaptation methods result in improved AUPRC for the majority of analyzed datasets. Altogether, these results indicate that unsupervised domain adaptations generally lead to performance improvements for the binary melanoma-nevus classification task regardless of the nature of the domain shift. However, small or heavily imbalanced datasets lead to a reduced conformity of the results due to the influence of these factors on the methods performance.
CVOct 23, 2024
Exploiting Text-Image Latent Spaces for the Description of Visual ConceptsLaines Schmalwasser, Jakob Gawlikowski, Joachim Denzler et al.
Concept Activation Vectors (CAVs) offer insights into neural network decision-making by linking human friendly concepts to the model's internal feature extraction process. However, when a new set of CAVs is discovered, they must still be translated into a human understandable description. For image-based neural networks, this is typically done by visualizing the most relevant images of a CAV, while the determination of the concept is left to humans. In this work, we introduce an approach to aid the interpretation of newly discovered concept sets by suggesting textual descriptions for each CAV. This is done by mapping the most relevant images representing a CAV into a text-image embedding where a joint description of these relevant images can be computed. We propose utilizing the most relevant receptive fields instead of full images encoded. We demonstrate the capabilities of this approach in multiple experiments with and without given CAV labels, showing that the proposed approach provides accurate descriptions for the CAVs and reduces the challenge of concept interpretation.
LGMay 23, 2025
FastCAV: Efficient Computation of Concept Activation Vectors for Explaining Deep Neural NetworksLaines Schmalwasser, Niklas Penzel, Joachim Denzler et al.
Concepts such as objects, patterns, and shapes are how humans understand the world. Building on this intuition, concept-based explainability methods aim to study representations learned by deep neural networks in relation to human-understandable concepts. Here, Concept Activation Vectors (CAVs) are an important tool and can identify whether a model learned a concept or not. However, the computational cost and time requirements of existing CAV computation pose a significant challenge, particularly in large-scale, high-dimensional architectures. To address this limitation, we introduce FastCAV, a novel approach that accelerates the extraction of CAVs by up to 63.6x (on average 46.4x). We provide a theoretical foundation for our approach and give concrete assumptions under which it is equivalent to established SVM-based methods. Our empirical results demonstrate that CAVs calculated with FastCAV maintain similar performance while being more efficient and stable. In downstream applications, i.e., concept-based explanation methods, we show that FastCAV can act as a replacement leading to equivalent insights. Hence, our approach enables previously infeasible investigations of deep models, which we demonstrate by tracking the evolution of concepts during model training.
CLJun 27, 2025
Analyzing and Fine-Tuning Whisper Models for Multilingual Pilot Speech Transcription in the CockpitKartheek Kumar Reddy Nareddy, Sarah Ternus, Julia Niebling
The developments in transformer encoder-decoder architectures have led to significant breakthroughs in machine translation, Automatic Speech Recognition (ASR), and instruction-based chat machines, among other applications. The pre-trained models were trained on vast amounts of generic data over a few epochs (fewer than five in most cases), resulting in their strong generalization capabilities. Nevertheless, the performance of these models does suffer when applied to niche domains like transcribing pilot speech in the cockpit, which involves a lot of specific vocabulary and multilingual conversations. This paper investigates and improves the transcription accuracy of cockpit conversations with Whisper models. We have collected around 85 minutes of cockpit simulator recordings and 130 minutes of interview recordings with pilots and manually labeled them. The speakers are middle aged men speaking both German and English. To improve the accuracy of transcriptions, we propose multiple normalization schemes to refine the transcripts and improve Word Error Rate (WER). We then employ fine-tuning to enhance ASR performance, utilizing performance-efficient fine-tuning with Low-Rank Adaptation (LoRA). Hereby, WER decreased from 68.49 \% (pretrained whisper Large model without normalization baseline) to 26.26\% (finetuned whisper Large model with the proposed normalization scheme).
LGJan 13, 2025
Anomalous Agreement: How to find the Ideal Number of Anomaly Classes in Correlated, Multivariate Time Series DataFerdinand Rewicki, Joachim Denzler, Julia Niebling
Detecting and classifying abnormal system states is critical for condition monitoring, but supervised methods often fall short due to the rarity of anomalies and the lack of labeled data. Therefore, clustering is often used to group similar abnormal behavior. However, evaluating cluster quality without ground truth is challenging, as existing measures such as the Silhouette Score (SSC) only evaluate the cohesion and separation of clusters and ignore possible prior knowledge about the data. To address this challenge, we introduce the Synchronized Anomaly Agreement Index (SAAI), which exploits the synchronicity of anomalies across multivariate time series to assess cluster quality. We demonstrate the effectiveness of SAAI by showing that maximizing SAAI improves accuracy on the task of finding the true number of anomaly classes K in correlated time series by 0.23 compared to SSC and by 0.32 compared to X-Means. We also show that clusters obtained by maximizing SAAI are easier to interpret compared to SSC.
LGJun 14, 2024
Unraveling Anomalies in Time: Unsupervised Discovery and Isolation of Anomalous Behavior in Bio-regenerative Life Support System TelemetryFerdinand Rewicki, Jakob Gawlikowski, Julia Niebling et al.
The detection of abnormal or critical system states is essential in condition monitoring. While much attention is given to promptly identifying anomalies, a retrospective analysis of these anomalies can significantly enhance our comprehension of the underlying causes of observed undesired behavior. This aspect becomes particularly critical when the monitored system is deployed in a vital environment. In this study, we delve into anomalies within the domain of Bio-Regenerative Life Support Systems (BLSS) for space exploration and analyze anomalies found in telemetry data stemming from the EDEN ISS space greenhouse in Antarctica. We employ time series clustering on anomaly detection results to categorize various types of anomalies in both uni- and multivariate settings. We then assess the effectiveness of these methods in identifying systematic anomalous behavior. Additionally, we illustrate that the anomaly detection methods MDI and DAMP produce complementary results, as previously indicated by research.