CVSep 25, 2024
Parameter-efficient Bayesian Neural Networks for Uncertainty-aware Depth EstimationRichard D. Paul, Alessio Quercia, Vincent Fortuin et al.
State-of-the-art computer vision tasks, like monocular depth estimation (MDE), rely heavily on large, modern Transformer-based architectures. However, their application in safety-critical domains demands reliable predictive performance and uncertainty quantification. While Bayesian neural networks provide a conceptually simple approach to serve those requirements, they suffer from the high dimensionality of the parameter space. Parameter-efficient fine-tuning (PEFT) methods, in particular low-rank adaptations (LoRA), have emerged as a popular strategy for adapting large-scale models to down-stream tasks by performing parameter inference on lower-dimensional subspaces. In this work, we investigate the suitability of PEFT methods for subspace Bayesian inference in large-scale Transformer-based vision models. We show that, indeed, combining BitFit, DiffFit, LoRA, and CoLoRA, a novel LoRA-inspired PEFT method, with Bayesian inference enables more robust and reliable predictive performance in MDE.
LGDec 11, 2025
Classifier Reconstruction Through Counterfactual-Aware Wasserstein PrototypesXuan Zhao, Zhuo Cao, Arya Bangun et al.
Counterfactual explanations provide actionable insights by identifying minimal input changes required to achieve a desired model prediction. Beyond their interpretability benefits, counterfactuals can also be leveraged for model reconstruction, where a surrogate model is trained to replicate the behavior of a target model. In this work, we demonstrate that model reconstruction can be significantly improved by recognizing that counterfactuals, which typically lie close to the decision boundary, can serve as informative though less representative samples for both classes. This is particularly beneficial in settings with limited access to labeled data. We propose a method that integrates original data samples with counterfactuals to approximate class prototypes using the Wasserstein barycenter, thereby preserving the underlying distributional structure of each class. This approach enhances the quality of the surrogate model and mitigates the issue of decision boundary shift, which commonly arises when counterfactuals are naively treated as ordinary training instances. Empirical results across multiple datasets show that our method improves fidelity between the surrogate and target models, validating its effectiveness.
LGFeb 3
Least but not Last: Fine-tuning Intermediate Principal Components for Better Performance-Forgetting Trade-OffsAlessio Quercia, Arya Bangun, Ira Assent et al.
Low-Rank Adaptation (LoRA) methods have emerged as crucial techniques for adapting large pre-trained models to downstream tasks under computational and memory constraints. However, they face a fundamental challenge in balancing task-specific performance gains against catastrophic forgetting of pre-trained knowledge, where existing methods provide inconsistent recommendations. This paper presents a comprehensive analysis of the performance-forgetting trade-offs inherent in low-rank adaptation using principal components as initialization. Our investigation reveals that fine-tuning intermediate components leads to better balance and show more robustness to high learning rates than first (PiSSA) and last (MiLoRA) components in existing work. Building on these findings, we provide a practical approach for initialization of LoRA that offers superior trade-offs. We demonstrate in a thorough empirical study on a variety of computer vision and NLP tasks that our approach improves accuracy and reduces forgetting, also in continual learning scenarios.
IVDec 7, 2025
Physics-Guided Diffusion Priors for Multi-Slice Reconstruction in Scientific ImagingLaurentius Valdy, Richard D. Paul, Alessio Quercia et al.
Accurate multi-slice reconstruction from limited measurement data is crucial to speed up the acquisition process in medical and scientific imaging. However, it remains challenging due to the ill-posed nature of the problem and the high computational and memory demands. We propose a framework that addresses these challenges by integrating partitioned diffusion priors with physics-based constraints. By doing so, we substantially reduce memory usage per GPU while preserving high reconstruction quality, outperforming both physics-only and full multi-slice reconstruction baselines for different modalities, namely Magnetic Resonance Imaging (MRI) and four-dimensional Scanning Transmission Electron Microscopy (4D-STEM). Additionally, we show that the proposed method improves in-distribution accuracy as well as strong generalization to out-of-distribution datasets.
LGNov 10, 2025
FlowTIE: Flow-based Transport of Intensity Equation for Phase Gradient Estimation from 4D-STEM DataArya Bangun, Maximilian Töllner, Xuan Zhao et al.
We introduce FlowTIE, a neural-network-based framework for phase reconstruction from 4D-Scanning Transmission Electron Microscopy (STEM) data, which integrates the Transport of Intensity Equation (TIE) with a flow-based representation of the phase gradient. This formulation allows the model to bridge data-driven learning with physics-based priors, improving robustness under dynamical scattering conditions for thick specimen. The validation on simulated datasets of crystalline materials, benchmarking to classical TIE and gradient-based optimization methods are presented. The results demonstrate that FlowTIE improves phase reconstruction accuracy, fast, and can be integrated with a thick specimen model, namely multislice method.
CVJan 22, 2025Code
Enhancing Monocular Depth Estimation with Multi-Source Auxiliary TasksAlessio Quercia, Erenus Yildiz, Zhuo Cao et al.
Monocular depth estimation (MDE) is a challenging task in computer vision, often hindered by the cost and scarcity of high-quality labeled datasets. We tackle this challenge using auxiliary datasets from related vision tasks for an alternating training scheme with a shared decoder built on top of a pre-trained vision foundation model, while giving a higher weight to MDE. Through extensive experiments we demonstrate the benefits of incorporating various in-domain auxiliary datasets and tasks to improve MDE quality on average by ~11%. Our experimental analysis shows that auxiliary tasks have different impacts, confirming the importance of task selection, highlighting that quality gains are not achieved by merely adding data. Remarkably, our study reveals that using semantic segmentation datasets as Multi-Label Dense Classification (MLDC) often results in additional quality gains. Lastly, our method significantly improves the data efficiency for the considered MDE datasets, enhancing their quality while reducing their size by at least 80%. This paves the way for using auxiliary data from related tasks to improve MDE quality despite limited availability of high-quality labeled data. Code is available at https://jugit.fz-juelich.de/ias-8/mdeaux.
IVDec 25, 2024
MRI Reconstruction with Regularized 3D Diffusion Model (R3DM)Arya Bangun, Zhuo Cao, Alessio Quercia et al.
Magnetic Resonance Imaging (MRI) is a powerful imaging technique widely used for visualizing structures within the human body and in other fields such as plant sciences. However, there is a demand to develop fast 3D-MRI reconstruction algorithms to show the fine structure of objects from under-sampled acquisition data, i.e., k-space data. This emphasizes the need for efficient solutions that can handle limited input while maintaining high-quality imaging. In contrast to previous methods only using 2D, we propose a 3D MRI reconstruction method that leverages a regularized 3D diffusion model combined with optimization method. By incorporating diffusion based priors, our method improves image quality, reduces noise, and enhances the overall fidelity of 3D MRI reconstructions. We conduct comprehensive experiments analysis on clinical and plant science MRI datasets. To evaluate the algorithm effectiveness for under-sampled k-space data, we also demonstrate its reconstruction performance with several undersampling patterns, as well as with in- and out-of-distribution pre-trained data. In experiments, we show that our method improves upon tested competitors.
CVDec 4, 2024
Equivariant Representation Learning for Augmentation-based Self-Supervised Learning via Image ReconstructionQin Wang, Kai Krajsek, Hanno Scharr
Augmentation-based self-supervised learning methods have shown remarkable success in self-supervised visual representation learning, excelling in learning invariant features but often neglecting equivariant ones. This limitation reduces the generalizability of foundation models, particularly for downstream tasks requiring equivariance. We propose integrating an image reconstruction task as an auxiliary component in augmentation-based self-supervised learning algorithms to facilitate equivariant feature learning without additional parameters. Our method implements a cross-attention mechanism to blend features learned from two augmented views, subsequently reconstructing one of them. This approach is adaptable to various datasets and augmented-pair based learning methods. We evaluate its effectiveness on learning equivariant features through multiple linear regression tasks and downstream applications on both artificial (3DIEBench) and natural (ImageNet) datasets. Results consistently demonstrate significant improvements over standard augmentation-based self-supervised learning methods and state-of-the-art approaches, particularly excelling in scenarios involving combined augmentations. Our method enhances the learning of both invariant and equivariant features, leading to more robust and generalizable visual representations for computer vision tasks.
CVMar 12, 2025
How To Make Your Cell Tracker Say "I dunno!"Richard D. Paul, Johannes Seiffarth, David Rügamer et al.
Cell tracking is a key computational task in live-cell microscopy, but fully automated analysis of high-throughput imaging requires reliable and, thus, uncertainty-aware data analysis tools, as the amount of data recorded within a single experiment exceeds what humans are able to overlook. We here propose and benchmark various methods to reason about and quantify uncertainty in linear assignment-based cell tracking algorithms. Our methods take inspiration from statistics and machine learning, leveraging two perspectives on the cell tracking problem explored throughout this work: Considering it as a Bayesian inference problem and as a classification problem. Our methods admit a framework-like character in that they equip any frame-to-frame tracking method with uncertainty quantification. We demonstrate this by applying it to various existing tracking algorithms including the recently presented Transformer-based trackers. We demonstrate empirically that our methods yield useful and well-calibrated tracking uncertainties.
LGOct 16, 2025
Galaxy Morphology Classification with Counterfactual ExplanationZhuo Cao, Lena Krieger, Hanno Scharr et al.
Galaxy morphologies play an essential role in the study of the evolution of galaxies. The determination of morphologies is laborious for a large amount of data giving rise to machine learning-based approaches. Unfortunately, most of these approaches offer no insight into how the model works and make the results difficult to understand and explain. We here propose to extend a classical encoder-decoder architecture with invertible flow, allowing us to not only obtain a good predictive performance but also provide additional information about the decision process with counterfactual explanations.
CVMar 11, 2025
1LoRA: Summation Compression for Very Low-Rank AdaptationAlessio Quercia, Zhuo Cao, Arya Bangun et al.
Parameter-Efficient Fine-Tuning (PEFT) methods have transformed the approach to fine-tuning large models for downstream tasks by enabling the adjustment of significantly fewer parameters than those in the original model matrices. In this work, we study the "very low rank regime", where we fine-tune the lowest amount of parameters per linear layer for each considered PEFT method. We propose 1LoRA (Summation Low-Rank Adaptation), a compute, parameter and memory efficient fine-tuning method which uses the feature sum as fixed compression and a single trainable vector as decompression. Differently from state-of-the-art PEFT methods like LoRA, VeRA, and the recent MoRA, 1LoRA uses fewer parameters per layer, reducing the memory footprint and the computational cost. We extensively evaluate our method against state-of-the-art PEFT methods on multiple fine-tuning tasks, and show that our method not only outperforms them, but is also more parameter, memory and computationally efficient. Moreover, thanks to its memory efficiency, 1LoRA allows to fine-tune more evenly across layers, instead of focusing on specific ones (e.g. attention layers), improving performance further.
CVNov 12, 2024
Retrieval of sun-induced plant fluorescence in the O$_2$-A absorption band from DESIS imageryJim Buffat, Miguel Pato, Kevin Alonso et al.
We provide the first method allowing to retrieve spaceborne SIF maps at 30 m ground resolution with a strong correlation ($r^2=0.6$) to high-quality airborne estimates of sun-induced fluorescence (SIF). SIF estimates can provide explanatory information for many tasks related to agricultural management and physiological studies. While SIF products from airborne platforms are accurate and spatially well resolved, the data acquisition of such products remains science-oriented and limited to temporally constrained campaigns. Spaceborne SIF products on the other hand are available globally with often sufficient revisit times. However, the spatial resolution of spaceborne SIF products is too small for agricultural applications. In view of ESA's upcoming FLEX mission we develop a method for SIF retrieval in the O$_2$-A band of hyperspectral DESIS imagery to provide first insights for spaceborne SIF retrieval at high spatial resolution. To this end, we train a simulation-based self-supervised network with a novel perturbation based regularizer and test performance improvements under additional supervised regularization of atmospheric variable prediction. In a validation study with corresponding HyPlant derived SIF estimates at 740 nm we find that our model reaches a mean absolute difference of 0.78 mW / nm / sr / m$^2$.
QMNov 6, 2024
EAP4EMSIG -- Experiment Automation Pipeline for Event-Driven Microscopy to Smart Microfluidic Single-Cells AnalysisNils Friederich, Angelo Jovin Yamachui Sitcheu, Annika Nassal et al.
Microfluidic Live-Cell Imaging (MLCI) generates high-quality data that allows biotechnologists to study cellular growth dynamics in detail. However, obtaining these continuous data over extended periods is challenging, particularly in achieving accurate and consistent real-time event classification at the intersection of imaging and stochastic biology. To address this issue, we introduce the Experiment Automation Pipeline for Event-Driven Microscopy to Smart Microfluidic Single-Cells Analysis (EAP4EMSIG). In particular, we present initial zero-shot results from the real-time segmentation module of our approach. Our findings indicate that among four State-Of-The- Art (SOTA) segmentation methods evaluated, Omnipose delivers the highest Panoptic Quality (PQ) score of 0.9336, while Contour Proposal Network (CPN) achieves the fastest inference time of 185 ms with the second-highest PQ score of 0.8575. Furthermore, we observed that the vision foundation model Segment Anything is unsuitable for this particular use case.
LGOct 16, 2025
LeapFactual: Reliable Visual Counterfactual Explanation Using Conditional Flow MatchingZhuo Cao, Xuan Zhao, Lena Krieger et al.
The growing integration of machine learning (ML) and artificial intelligence (AI) models into high-stakes domains such as healthcare and scientific research calls for models that are not only accurate but also interpretable. Among the existing explainable methods, counterfactual explanations offer interpretability by identifying minimal changes to inputs that would alter a model's prediction, thus providing deeper insights. However, current counterfactual generation methods suffer from critical limitations, including gradient vanishing, discontinuous latent spaces, and an overreliance on the alignment between learned and true decision boundaries. To overcome these limitations, we propose LeapFactual, a novel counterfactual explanation algorithm based on conditional flow matching. LeapFactual generates reliable and informative counterfactuals, even when true and learned decision boundaries diverge. Following a model-agnostic approach, LeapFactual is not limited to models with differentiable loss functions. It can even handle human-in-the-loop systems, expanding the scope of counterfactual explanations to domains that require the participation of human annotators, such as citizen science. We provide extensive experiments on benchmark and real-world datasets showing that LeapFactual generates accurate and in-distribution counterfactual explanations that offer actionable insights. We observe, for instance, that our reliable counterfactual samples with labels aligning to ground truth can be beneficially used as new training data to enhance the model. The proposed method is broadly applicable and enhances both scientific knowledge discovery and non-expert interpretability.
QMMar 30, 2025
EAP4EMSIG -- Enhancing Event-Driven Microscopy for Microfluidic Single-Cell AnalysisNils Friederich, Angelo Jovin Yamachui Sitcheu, Annika Nassal et al.
Microfluidic Live-Cell Imaging (MLCI) yields data on microbial cell factories. However, continuous acquisition is challenging as high-throughput experiments often lack real-time insights, delaying responses to stochastic events. We introduce three components in the Experiment Automation Pipeline for Event-Driven Microscopy to Smart Microfluidic Single-Cell Analysis (EAP4EMSIG): a fast, accurate Multi-Layer Perceptron (MLP)-based autofocusing method predicting the focus offset, an evaluation of real-time segmentation methods and a real-time data analysis dashboard. Our MLP-based autofocusing achieves a Mean Absolute Error (MAE) of 0.105 $μ$m with inference times from 87 ms. Among eleven evaluated Deep Learning (DL) segmentation methods, Cellpose reached a Panoptic Quality (PQ) of 93.36 %, while a distance-based method was fastest (121 ms, Panoptic Quality 93.02 %).
IVMar 28, 2025
Efficient Epistemic Uncertainty Estimation in Cerebrovascular SegmentationOmini Rathore, Richard Paul, Abigail Morrison et al.
Brain vessel segmentation of MR scans is a critical step in the diagnosis of cerebrovascular diseases. Due to the fine vessel structure, manual vessel segmentation is time consuming. Therefore, automatic deep learning (DL) based segmentation techniques are intensively investigated. As conventional DL models yield a high complexity and lack an indication of decision reliability, they are often considered as not trustworthy. This work aims to increase trust in DL based models by incorporating epistemic uncertainty quantification into cerebrovascular segmentation models for the first time. By implementing an efficient ensemble model combining the advantages of Bayesian Approximation and Deep Ensembles, we aim to overcome the high computational costs of conventional probabilistic networks. Areas of high model uncertainty and erroneous predictions are aligned which demonstrates the effectiveness and reliability of the approach. We perform extensive experiments applying the ensemble model on out-of-distribution (OOD) data. We demonstrate that for OOD-images, the estimated uncertainty increases. Additionally, omitting highly uncertain areas improves the segmentation quality, both for in- and out-of-distribution data. The ensemble model explains its limitations in a reliable manner and can maintain trustworthiness also for OOD data and could be considered in clinical applications
CVMar 24, 2025
Self-Supervised Learning based on Transformed Image Reconstruction for Equivariance-Coherent Feature RepresentationQin Wang, Benjamin Bruns, Hanno Scharr et al.
The equivariant behaviour of features is essential in many computer vision tasks, yet popular self-supervised learning (SSL) methods tend to constrain equivariance by design. We propose a self-supervised learning approach where the system learns transformations independently by reconstructing images that have undergone previously unseen transformations. Specifically, the model is tasked to reconstruct intermediate transformed images, e.g. translated or rotated images, without prior knowledge of these transformations. This auxiliary task encourages the model to develop equivariance-coherent features without relying on predefined transformation rules. To this end, we apply transformations to the input image, generating an image pair, and then split the extracted features into two sets per image. One set is used with a usual SSL loss encouraging invariance, the other with our loss based on the auxiliary task to reconstruct the intermediate transformed images. Our loss and the SSL loss are linearly combined with weighted terms. Evaluating on synthetic tasks with natural images, our proposed method strongly outperforms all competitors, regardless of whether they are designed to learn equivariance. Furthermore, when trained alongside augmentation-based methods as the invariance tasks, such as iBOT or DINOv2, we successfully learn a balanced combination of invariant and equivariant features. Our approach performs strong on a rich set of realistic computer vision downstream tasks, almost always improving over all baselines.
IVNov 8, 2024
Untrained Perceptual Loss for image denoising of line-like structures in MR imagesElisabeth Pfaehler, Daniel Pflugfelder, Hanno Scharr
In the acquisition of Magnetic Resonance (MR) images shorter scan times lead to higher image noise. Therefore, automatic image denoising using deep learning methods is of high interest. MR images containing line-like structures such as roots or vessels yield special characteristics as they display connected structures and yield sparse information. For this kind of data, it is important to consider voxel neighborhoods when training a denoising network. In this paper, we translate the Perceptual Loss to 3D data by comparing feature maps of untrained networks in the loss function as done previously for 2D data. We tested the performance of untrained Perceptual Loss (uPL) on 3D image denoising of MR images displaying brain vessels (MR angiograms - MRA) and images of plant roots in soil. We investigate the impact of various uPL characteristics such as weight initialization, network depth, kernel size, and pooling operations on the results. We tested the performance of the uPL loss on four Rician noise levels using evaluation metrics such as the Structural Similarity Index Metric (SSIM). We observe, that our uPL outperforms conventional loss functions such as the L1 loss or a loss based on the Structural Similarity Index Metric (SSIM). The uPL network's initialization is not important, while network depth and pooling operations impact denoising performance. E.g. for both datasets a network with five convolutional layers led to the best performance while a network with more layers led to a performance drop. We also find that small uPL networks led to better or comparable results than using large networks such as VGG. We observe superior performance of our loss for both datasets, all noise levels, and three network architectures. In conclusion, for images containing line-like structures, uPL is an alternative to other loss functions for 3D image denoising.
CVOct 28, 2021
Towards Large-Scale Rendering of Simulated Crops for Synthetic Ground Truth Generation on Modular SupercomputersDirk Norbert Helmrich, Jens Henrik Göbbert, Mona Giraud et al.
Computer Vision problems deal with the semantic extraction of information from camera images. Especially for field crop images, the underlying problems are hard to label and even harder to learn, and the availability of high-quality training data is low. Deep neural networks do a good job of extracting the necessary models from training examples. However, they rely on an abundance of training data that is not feasible to generate or label by expert annotation. To address this challenge, we make use of the Unreal Engine to render large and complex virtual scenes. We rely on the performance of individual nodes by distributing plant simulations across nodes and both generate scenes as well as train neural networks on GPUs, restricting node communication to parallel learning.
CVSep 4, 2017
ARIGAN: Synthetic Arabidopsis Plants using Generative Adversarial NetworkMario Valerio Giuffrida, Hanno Scharr, Sotirios A Tsaftaris
In recent years, there has been an increasing interest in image-based plant phenotyping, applying state-of-the-art machine learning approaches to tackle challenging problems, such as leaf segmentation (a multi-instance problem) and counting. Most of these algorithms need labelled data to learn a model for the task at hand. Despite the recent release of a few plant phenotyping datasets, large annotated plant image datasets for the purpose of training deep learning algorithms are lacking. One common approach to alleviate the lack of training data is dataset augmentation. Herein, we propose an alternative solution to dataset augmentation for plant phenotyping, creating artificial images of plants using generative neural networks. We propose the Arabidopsis Rosette Image Generator (through) Adversarial Network: a deep convolutional network that is able to generate synthetic rosette-shaped plants, inspired by DCGAN (a recent adversarial network model using convolutional layers). Specifically, we trained the network using A1, A2, and A4 of the CVPPP 2017 LCC dataset, containing Arabidopsis Thaliana plants. We show that our model is able to generate realistic 128x128 colour images of plants. We train our network conditioning on leaf count, such that it is possible to generate plants with a given number of leaves suitable, among others, for training regression based models. We propose a new Ax dataset of artificial plants images, obtained by our ARIGAN. We evaluate this new dataset using a state-of-the-art leaf counting algorithm, showing that the testing error is reduced when Ax is used as part of the training data.