Anders Bjorholm Dahl

CV
h-index16
21papers
91citations
Novelty42%
AI Score52

21 Papers

CVSep 20, 2024
Feature-Centered First Order Structure Tensor Scale-Space in 2D and 3D

Pawel Tomasz Pieta, Anders Bjorholm Dahl, Jeppe Revall Frisvad et al.

The structure tensor method is often used for 2D and 3D analysis of imaged structures, but its results are in many cases very dependent on the user's choice of method parameters. We simplify this parameter choice in first order structure tensor scale-space by directly connecting the width of the derivative filter to the size of image features. By introducing a ring-filter step, we substitute the Gaussian integration/smoothing with a method that more accurately shifts the derivative filter response from feature edges to their center. We further demonstrate how extracted structural measures can be used to correct known inaccuracies in the scale map, resulting in a reliable representation of the feature sizes both in 2D and 3D. Compared to the traditional first order structure tensor, or previous structure tensor scale-space approaches, our solution is much more accurate and can serve as an out-of-the-box method for extracting a wide range of structural parameters with minimal user input.

CVApr 4, 2023
SportsPose -- A Dynamic 3D sports pose dataset

Christian Keilstrup Ingwersen, Christian Mikkelstrup, Janus Nørtoft Jensen et al.

Accurate 3D human pose estimation is essential for sports analytics, coaching, and injury prevention. However, existing datasets for monocular pose estimation do not adequately capture the challenging and dynamic nature of sports movements. In response, we introduce SportsPose, a large-scale 3D human pose dataset consisting of highly dynamic sports movements. With more than 176,000 3D poses from 24 different subjects performing 5 different sports activities, SportsPose provides a diverse and comprehensive set of 3D poses that reflect the complex and dynamic nature of sports movements. Contrary to other markerless datasets we have quantitatively evaluated the precision of SportsPose by comparing our poses with a commercial marker-based system and achieve a mean error of 34.5 mm across all evaluation sequences. This is comparable to the error reported on the commonly used 3DPW dataset. We further introduce a new metric, local movement, which describes the movement of the wrist and ankle joints in relation to the body. With this, we show that SportsPose contains more movement than the Human3.6M and 3DPW datasets in these extremum joints, indicating that our movements are more dynamic. The dataset with accompanying code can be downloaded from our website. We hope that SportsPose will allow researchers and practitioners to develop and evaluate more effective models for the analysis of sports performance and injury prevention. With its realistic and diverse dataset, SportsPose provides a valuable resource for advancing the state-of-the-art in pose estimation in sports.

CVApr 4, 2023
BugNIST -- a Large Volumetric Dataset for Object Detection under Domain Shift

Patrick Møller Jensen, Vedrana Andersen Dahl, Carsten Gundlach et al.

Domain shift significantly influences the performance of deep learning algorithms, particularly for object detection within volumetric 3D images. Annotated training data is essential for deep learning-based object detection. However, annotating densely packed objects is time-consuming and costly. Instead, we suggest training models on individually scanned objects, causing a domain shift between training and detection data. To address this challenge, we introduce the BugNIST dataset, comprising 9154 micro-CT volumes of 12 bug types and 388 volumes of tightly packed bug mixtures. This dataset is characterized by having objects with the same appearance in the source and target domains, which is uncommon for other benchmark datasets for domain shift. During training, individual bug volumes labeled by class are utilized, while testing employs mixtures with center point annotations and bug type labels. Together with the dataset, we provide a baseline detection analysis, with the aim of advancing the field of 3D object detection methods.

CVMar 19
Towards High-Quality Image Segmentation: Improving Topology Accuracy by Penalizing Neighbor Pixels

Juan Miguel Valverde, Dim P. Papadopoulos, Rasmus Larsen et al.

Standard deep learning models for image segmentation cannot guarantee topology accuracy, failing to preserve the correct number of connected components or structures. This, in turn, affects the quality of the segmentations and compromises the reliability of the subsequent quantification analyses. Previous works have proposed to enhance topology accuracy with specialized frameworks, architectures, and loss functions. However, these methods are often cumbersome to integrate into existing training pipelines, they are computationally very expensive, or they are restricted to structures with tubular morphology. We present SCNP, an efficient method that improves topology accuracy by penalizing the logits with their poorest-classified neighbor, forcing the model to improve the prediction at the pixels' neighbors before allowing it to improve the pixels themselves. We show the effectiveness of SCNP across 13 datasets, covering different structure morphologies and image modalities, and integrate it into three frameworks for semantic and instance segmentation. Additionally, we show that SCNP can be integrated into several loss functions, making them improve topology accuracy. Our code can be found at https://jmlipman.github.io/SCNP-SameClassNeighborPenalization.

CVMar 24
VoDaSuRe: A Large-Scale Dataset Revealing Domain Shift in Volumetric Super-Resolution

August Leander Høeg, Sophia Wiinberg Bardenfleth, Hans Martin Kjer et al.

Recent advances in volumetric super-resolution (SR) have demonstrated strong performance in medical and scientific imaging, with transformer- and CNN-based approaches achieving impressive results even at extreme scaling factors. In this work, we show that much of this performance stems from training on downsampled data rather than real low-resolution scans. This reliance on downsampling is partly driven by the scarcity of paired high- and low-resolution 3D datasets. To address this, we introduce VoDaSuRe, a large-scale volumetric dataset containing paired high- and low-resolution scans. When training models on VoDaSuRe, we reveal a significant discrepancy: SR models trained on downsampled data produce substantially sharper predictions than those trained on real low-resolution scans, which smooth fine structures. Conversely, applying models trained on downsampled data to real scans preserves more structure but is inaccurate. Our findings suggest that current SR methods are overstated - when applied to real data, they do not recover structures lost in low-resolution scans and instead predict a smoothed average. We argue that progress in deep learning-based volumetric SR requires datasets with paired real scans of high complexity, such as VoDaSuRe. Our dataset and code are publicly available through: https://augusthoeg.github.io/VoDaSuRe/

CVNov 21, 2023
Two Views Are Better than One: Monocular 3D Pose Estimation with Multiview Consistency

Christian Keilstrup Ingwersen, Rasmus Tirsgaard, Rasmus Nylander et al.

Deducing a 3D human pose from a single 2D image is inherently challenging because multiple 3D poses can correspond to the same 2D representation. 3D data can resolve this pose ambiguity, but it is expensive to record and requires an intricate setup that is often restricted to controlled lab environments. We propose a method that improves the performance of deep learning-based monocular 3D human pose estimation models by using multiview data only during training, but not during inference. We introduce a novel loss function, consistency loss, which operates on two synchronized views. This approach is simpler than previous models that require 3D ground truth or intrinsic and extrinsic camera parameters. Our consistency loss penalizes differences in two pose sequences after rigid alignment. We also demonstrate that our consistency loss substantially improves performance for fine-tuning without requiring 3D data. Furthermore, we show that using our consistency loss can yield state-of-the-art performance when training models from scratch in a semi-supervised manner. Our findings provide a simple way to capture new data, e.g in a new domain. This data can be added using off-the-shelf cameras with no calibration requirements. We make all our code and data publicly available.

CVMay 13
Fast and Compact Graph Cuts for the Boykov-Kolmogorov Algorithm

Christian Møller Mikkelstrup, Anders Bjorholm Dahl, Philip Bille et al.

Computing a minimum $s$-$t$ cut in a graph is a solution to a wide range of computer vision problems, and is often done using the Boykov-Kolmogorov (BK) algorithm. In this paper, we revisit the BK algorithm from both a theoretical and practical point of view. We improve the analysis of the time complexity of the BK algorithm to $O(mn|C|)$ and propose a new algorithm, the fast and compact BK (fcBK) algorithm, with a time complexity of $O(m|C|)$, where $m$, $n$, and $|C|$ are the number of edges, number of vertices, and the capacity of the cut, respectively. We additionally propose a compact graph representation that allows our implementation to find a minimum $s$-$t$ cut in a graph with upwards of $10^9$ vertices and $10^{10}$ edges on a machine with 128 GB of memory. We find our implementation of the BK algorithm to be the fastest available implementation of the BK algorithm when evaluating on a comprehensive set of benchmark datasets, highlighting the importance of memory-efficient implementations. We make our implementations publicly available for further research and implementation development within minimum $s$-$t$ cut algorithms.

CVMar 5, 2025Code
TopoMortar: A dataset to evaluate image segmentation methods focused on topology accuracy

Juan Miguel Valverde, Motoya Koga, Nijihiko Otsuka et al.

We present TopoMortar, a brick wall dataset that is the first dataset specifically designed to evaluate topology-focused image segmentation methods, such as topology loss functions. Motivated by the known sensitivity of methods to dataset challenges, such as small training sets, noisy labels, and out-of-distribution test-set images, TopoMortar is created to enable in two ways investigating methods' effectiveness at improving topology accuracy. First, by eliminating dataset challenges that, as we show, impact the effectiveness of topology loss functions. Second, by allowing to represent different dataset challenges in the same dataset, isolating methods' performance from dataset challenges. TopoMortar includes three types of labels (accurate, pseudo-labels, and noisy labels), two fixed training sets (large and small), and in-distribution and out-of-distribution test-set images. We compared eight loss functions on TopoMortar, and we found that clDice achieved the most topologically accurate segmentations, and that the relative advantageousness of the other loss functions depends on the experimental setting. Additionally, we show that data augmentation and self-distillation can elevate Cross entropy Dice loss to surpass most topology loss functions, and that those simple methods can enhance topology loss functions as well. TopoMortar and our code can be found at https://jmlipman.github.io/TopoMortar

CVMar 19
Rethinking Uncertainty Quantification and Entanglement in Image Segmentation

Jakob Lønborg Christensen, Vedrana Andersen Dahl, Morten Rieger Hannemose et al.

Uncertainty quantification (UQ) is crucial in safety-critical applications such as medical image segmentation. Total uncertainty is typically decomposed into data-related aleatoric uncertainty (AU) and model-related epistemic uncertainty (EU). Many methods exist for modeling AU (such as Probabilistic UNet, Diffusion) and EU (such as ensembles, MC Dropout), but it is unclear how they interact when combined. Additionally, recent work has revealed substantial entanglement between AU and EU, undermining the interpretability and practical usefulness of the decomposition. We present a comprehensive empirical study covering a broad range of AU-EU model combinations, propose a metric to quantify uncertainty entanglement, and evaluate both across downstream UQ tasks. For out-of-distribution detection, ensembles exhibit consistently lower entanglement and superior performance. For ambiguity modeling and calibration the best models are dataset-dependent, with softmax/SSN-based methods performing well and Probabilistic UNets being less entangled. A softmax ensemble fares remarkably well on all tasks. Finally, we analyze potential sources of uncertainty entanglement and outline directions for mitigating this effect.

CVMar 7, 2025Code
Disconnect to Connect: A Data Augmentation Method for Improving Topology Accuracy in Image Segmentation

Juan Miguel Valverde, Maja Østergaard, Adrian Rodriguez-Palomo et al.

Accurate segmentation of thin, tubular structures (e.g., blood vessels) is challenging for deep neural networks. These networks classify individual pixels, and even minor misclassifications can break the thin connections within these structures. Existing methods for improving topology accuracy, such as topology loss functions, rely on very precise, topologically-accurate training labels, which are difficult to obtain. This is because annotating images, especially 3D images, is extremely laborious and time-consuming. Low image resolution and contrast further complicates the annotation by causing tubular structures to appear disconnected. We present CoLeTra, a data augmentation strategy that integrates to the models the prior knowledge that structures that appear broken are actually connected. This is achieved by creating images with the appearance of disconnected structures while maintaining the original labels. Our extensive experiments, involving different architectures, loss functions, and datasets, demonstrate that CoLeTra leads to segmentations topologically more accurate while often improving the Dice coefficient and Hausdorff distance. CoLeTra's hyper-parameters are intuitive to tune, and our sensitivity analysis shows that CoLeTra is robust to changes in these hyper-parameters. We also release a dataset specifically suited for image segmentation methods with a focus on topology accuracy. CoLetra's code can be found at https://github.com/jmlipman/CoLeTra.

CVJan 7, 2025
Materialist: Physically Based Editing Using Single-Image Inverse Rendering

Lezhong Wang, Duc Minh Tran, Ruiqi Cui et al.

Achieving physically consistent image editing remains a significant challenge in computer vision. Existing image editing methods typically rely on neural networks, which struggle to accurately handle shadows and refractions. Conversely, physics-based inverse rendering often requires multi-view optimization, limiting its practicality in single-image scenarios. In this paper, we propose Materialist, a method combining a learning-based approach with physically based progressive differentiable rendering. Given an image, our method leverages neural networks to predict initial material properties. Progressive differentiable rendering is then used to optimize the environment map and refine the material properties with the goal of closely matching the rendered result to the input image. Our approach enables a range of applications, including material editing, object insertion, and relighting, while also introducing an effective method for editing material transparency without requiring full scene geometry. Furthermore, Our envmap estimation method also achieves state-of-the-art performance, further enhancing the accuracy of image editing task. Experiments demonstrate strong performance across synthetic and real-world datasets, excelling even on challenging out-of-domain images. Project website: https://lez-s.github.io/materialist_project/

CVFeb 21
MaskDiME: Adaptive Masked Diffusion for Precise and Efficient Visual Counterfactual Explanations

Changlu Guo, Anders Nymark Christensen, Anders Bjorholm Dahl et al.

Visual counterfactual explanations aim to reveal the minimal semantic modifications that can alter a model's prediction, providing causal and interpretable insights into deep neural networks. However, existing diffusion-based counterfactual generation methods are often computationally expensive, slow to sample, and imprecise in localizing the modified regions. To address these limitations, we propose MaskDiME, a simple, fast, and effective diffusion framework that unifies semantic consistency and spatial precision through localized sampling. Our approach adaptively focuses on decision-relevant regions to achieve localized and semantically consistent counterfactual generation while preserving high image fidelity. Our training-free framework, MaskDiME, achieves over 30x faster inference than the baseline method and achieves comparable or state-of-the-art performance across five benchmark datasets spanning diverse visual domains, establishing a practical and generalizable solution for efficient counterfactual explanation.

GRSep 28, 2025
ReLumix: Extending Image Relighting to Video via Video Diffusion Models

Lezhong Wang, Shutong Jin, Ruiqi Cui et al.

Controlling illumination during video post-production is a crucial yet elusive goal in computational photography. Existing methods often lack flexibility, restricting users to certain relighting models. This paper introduces ReLumix, a novel framework that decouples the relighting algorithm from temporal synthesis, thereby enabling any image relighting technique to be seamlessly applied to video. Our approach reformulates video relighting into a simple yet effective two-stage process: (1) an artist relights a single reference frame using any preferred image-based technique (e.g., Diffusion Models, physics-based renderers); and (2) a fine-tuned stable video diffusion (SVD) model seamlessly propagates this target illumination throughout the sequence. To ensure temporal coherence and prevent artifacts, we introduce a gated cross-attention mechanism for smooth feature blending and a temporal bootstrapping strategy that harnesses SVD's powerful motion priors. Although trained on synthetic data, ReLumix shows competitive generalization to real-world videos. The method demonstrates significant improvements in visual fidelity, offering a scalable and versatile solution for dynamic lighting control.

CVSep 15, 2025
SA-UNetv2: Rethinking Spatial Attention U-Net for Retinal Vessel Segmentation

Changlu Guo, Anders Nymark Christensen, Anders Bjorholm Dahl et al.

Retinal vessel segmentation is essential for early diagnosis of diseases such as diabetic retinopathy, hypertension, and neurodegenerative disorders. Although SA-UNet introduces spatial attention in the bottleneck, it underuses attention in skip connections and does not address the severe foreground-background imbalance. We propose SA-UNetv2, a lightweight model that injects cross-scale spatial attention into all skip connections to strengthen multi-scale feature fusion and adopts a weighted Binary Cross-Entropy (BCE) plus Matthews Correlation Coefficient (MCC) loss to improve robustness to class imbalance. On the public DRIVE and STARE datasets, SA-UNetv2 achieves state-of-the-art performance with only 1.2MB memory and 0.26M parameters (less than 50% of SA-UNet), and 1 second CPU inference on 592 x 592 x 3 images, demonstrating strong efficiency and deployability in resource-constrained, CPU-only settings.

CVApr 8, 2025
Diffusion Based Ambiguous Image Segmentation

Jakob Lønborg Christensen, Morten Rieger Hannemose, Anders Bjorholm Dahl et al.

Medical image segmentation often involves inherent uncertainty due to variations in expert annotations. Capturing this uncertainty is an important goal and previous works have used various generative image models for the purpose of representing the full distribution of plausible expert ground truths. In this work, we explore the design space of diffusion models for generative segmentation, investigating the impact of noise schedules, prediction types, and loss weightings. Notably, we find that making the noise schedule harder with input scaling significantly improves performance. We conclude that x- and v-prediction outperform epsilon-prediction, likely because the diffusion process is in the discrete segmentation domain. Many loss weightings achieve similar performance as long as they give enough weight to the end of the diffusion process. We base our experiments on the LIDC-IDRI lung lesion dataset and obtain state-of-the-art (SOTA) performance. Additionally, we introduce a randomly cropped variant of the LIDC-IDRI dataset that is better suited for uncertainty in image segmentation. Our model also achieves SOTA in this harder setting.

CVApr 8, 2025
Fast Sphericity and Roundness approximation in 2D and 3D using Local Thickness

Pawel Tomasz Pieta, Peter Winkel Rasumssen, Anders Bjorholm Dahl et al.

Sphericity and roundness are fundamental measures used for assessing object uniformity in 2D and 3D images. However, using their strict definition makes computation costly. As both 2D and 3D microscopy imaging datasets grow larger, there is an increased demand for efficient algorithms that can quantify multiple objects in large volumes. We propose a novel approach for extracting sphericity and roundness based on the output of a local thickness algorithm. For sphericity, we simplify the surface area computation by modeling objects as spheroids/ellipses of varying lengths and widths of mean local thickness. For roundness, we avoid a complex corner curvature determination process by approximating it with local thickness values on the contour/surface of the object. The resulting methods provide an accurate representation of the exact measures while being significantly faster than their existing implementations.

CVMar 3, 2025
A General Purpose Spectral Foundational Model for Both Proximal and Remote Sensing Spectral Imaging

William Michael Laprade, Jesper Cairo Westergaard, Svend Christensen et al.

Spectral imaging data acquired via multispectral and hyperspectral cameras can have hundreds of channels, where each channel records the reflectance at a specific wavelength and bandwidth. Time and resource constraints limit our ability to collect large spectral datasets, making it difficult to build and train predictive models from scratch. In the RGB domain, we can often alleviate some of the limitations of smaller datasets by using pretrained foundational models as a starting point. However, most existing foundation models are pretrained on large datasets of 3-channel RGB images, severely limiting their effectiveness when used with spectral imaging data. The few spectral foundation models that do exist usually have one of two limitations: (1) they are built and trained only on remote sensing data limiting their application in proximal spectral imaging, (2) they utilize the more widely available multispectral imaging datasets with less than 15 channels restricting their use with hundred-channel hyperspectral images. To alleviate these issues, we propose a large-scale foundational model and dataset built upon the masked autoencoder architecture that takes advantage of spectral channel encoding, spatial-spectral masking and ImageNet pretraining for an adaptable and robust model for downstream spectral imaging tasks.

CVDec 6, 2024
MozzaVID: Mozzarella Volumetric Image Dataset

Pawel Tomasz Pieta, Peter Winkel Rasmussen, Anders Bjorholm Dahl et al.

Influenced by the complexity of volumetric imaging, there is a shortage of established datasets useful for benchmarking volumetric deep-learning models. As a consequence, new and existing models are not easily comparable, limiting the development of architectures optimized specifically for volumetric data. To counteract this trend, we introduce MozzaVID - a large, clean, and versatile volumetric classification dataset. Our dataset contains X-ray computed tomography (CT) images of mozzarella microstructure and enables the classification of 25 cheese types and 149 cheese samples. We provide data in three different resolutions, resulting in three dataset instances containing from 591 to 37,824 images. While being general-purpose, the dataset also facilitates investigating mozzarella structure properties. The structure of food directly affects its functional properties and thus its consumption experience. Understanding food structure helps tune the production and mimicking it enables sustainable alternatives to animal-derived food products. The complex and disordered nature of food structures brings a unique challenge, where a choice of appropriate imaging method, scale, and sample size is not trivial. With this dataset we aim to address these complexities, contributing to more robust structural analysis models. The dataset can be downloaded from: https://archive.compute.dtu.dk/files/public/projects/MozzaVID/.

IVNov 30, 2020
Sparse-View Spectral CT Reconstruction Using Deep Learning

Wail Mustafa, Christian Kehl, Ulrik Lund Olsen et al.

Spectral computed tomography (CT) is an emerging technology capable of providing high chemical specificity, which is crucial for many applications such as detecting threats in luggage. This type of application requires both fast and high-quality image reconstruction and is often based on sparse-view (few) projections. The conventional filtered back projection (FBP) method is fast but it produces low-quality images dominated by noise and artifacts in sparse-view CT. Iterative methods with, e.g., total variation regularizers can circumvent that but they are computationally expensive, as the computational load proportionally increases with the number of spectral channels. Instead, we propose an approach for fast reconstruction of sparse-view spectral CT data using a U-Net convolutional neural network architecture with multi-channel input and output. The network is trained to output high-quality CT images from FBP input image reconstructions. Our method is fast at run-time and because the internal convolutions are shared between the channels, the computational load increases only at the first and last layers, making it an efficient approach to process spectral data with a large number of channels. We have validated our approach using real CT scans. Our results show qualitatively and quantitatively that our approach outperforms the state-of-the-art iterative methods. Furthermore, the results indicate that the network can exploit the coupling of the channels to enhance the overall quality and robustness.

CVOct 28, 2018
Multi-Spectral Imaging via Computed Tomography (MUSIC) - Comparing Unsupervised Spectral Segmentations for Material Differentiation

Christian Kehl, Wail Mustafa, Jan Kehres et al.

Multi-spectral computed tomography is an emerging technology for the non-destructive identification of object materials and the study of their physical properties. Applications of this technology can be found in various scientific and industrial contexts, such as luggage scanning at airports. Material distinction and its identification is challenging, even with spectral x-ray information, due to acquisition noise, tomographic reconstruction artefacts and scanning setup application constraints. We present MUSIC - and open access multi-spectral CT dataset in 2D and 3D - to promote further research in the area of material identification. We demonstrate the value of this dataset on the image analysis challenge of object segmentation purely based on the spectral response of its composing materials. In this context, we compare the segmentation accuracy of fast adaptive mean shift (FAMS) and unconstrained graph cuts on both datasets. We further discuss the impact of reconstruction artefacts and segmentation controls on the achievable results. Dataset, related software packages and further documentation are made available to the imaging community in an open-access manner to promote further data-driven research on the subject

CVSep 6, 2018
Content-based Propagation of User Markings for Interactive Segmentation of Patterned Images

Vedrana Andersen Dahl, Monica Jane Emerson, Camilla Himmelstrup Trinderup et al.

Efficient and easy segmentation of images and volumes is of great practical importance. Segmentation problems that motivate our approach originate from microscopy imaging commonly used in materials science, medicine, and biology. We formulate image segmentation as a probabilistic pixel classification problem, and we apply segmentation as a step towards characterising image content. Our method allows the user to define structures of interest by interactively marking a subset of pixels. Thanks to the real-time feedback, the user can place new markings strategically, depending on the current outcome. The final pixel classification may be obtained from a very modest user input. An important ingredient of our method is a graph that encodes image content. This graph is built in an unsupervised manner during initialisation and is based on clustering of image features. Since we combine a limited amount of user-labelled data with the clustering information obtained from the unlabelled parts of the image, our method fits in the general framework of semi-supervised learning. We demonstrate how this can be a very efficient approach to segmentation through pixel classification.