Tommy Löfstedt

LG
h-index33
19papers
245citations
Novelty42%
AI Score50

19 Papers

LGMay 4Code
A Unified Framework for Tabular Generative Modeling: Loss Functions, Benchmarks, and Improved Multi-objective Bayesian Optimization Approaches

Minh H. Vu, Daniel Edler, Carl Wibom et al.

Deep learning (DL) models require extensive data to achieve strong performance and generalization. Deep generative models (DGMs) offer a solution by synthesizing data. Yet current approaches for tabular data often fail to preserve feature correlations and distributions during training, struggle with multi-metric hyperparameter selection, and lack comprehensive evaluation protocols. We address this gap with a unified framework that integrates training, hyperparameter tuning, and evaluation. First, we introduce a novel correlation- and distribution-aware loss function that regularizes DGMs, enhancing their ability to generate synthetic tabular data that faithfully represents the underlying data distributions. Theoretical analysis establishes stability and consistency guarantees. To enable principled hyperparameter search via Bayesian optimization (BO), we also propose a new multi-objective aggregation strategy based on iterative objective refinement Bayesian optimization (IORBO), along with a comprehensive statistical testing framework. We validate the proposed approach using a benchmarking framework with twenty real-world datasets and ten established tabular DGM baselines. The correlation-aware loss function significantly improves synthetic data fidelity and downstream machine learning (ML) performance, while IORBO consistently outperforms standard Bayesian optimization (SBO) in hyperparameter selection. The unified framework advances tabular generative modeling beyond isolated method improvements. Code is available at: https://github.com/vuhoangminh/TabGen-Framework

LGJun 3
Generalized TV--$\ell_p$ Structured Priors for Bayesian $T_1$ Mapping

Disi Lin, Martin Berggren, Tommy Löfstedt

We propose an extended family of structured spatial priors that incorporates the total variation (TV) function with $\ell_p$ norms. The prior is proven to be proper and incorporated into a Bayesian regression framework to enable uncertainty quantification in $T_1$ mapping, with posterior inference performed using the No-U-Turn Sampler (NUTS). This TV--$\ell_p$ construction is proven to constitute a well-defined family of prior distributions, and it naturally enforces spatial consistency and smooth variations in the estimated parameter maps. The method was evaluated in comparison to maximum-likelihood estimation and several Bayesian alternative priors based on the uniform, Gamma, and bounded TV priors. The evaluation includes experiments on synthetic brain and cardiac $T_1$ mapping datasets, as well as a real in-vivo breast $T_1$ mapping dataset. The results show that the TV--$\ell_p$ prior yields more concentrated posterior densities, indicating reduced uncertainty. It also consistently achieves lower variance and smaller (negative) bias, leading to more reliable estimates. Overall, embedding a TV-based structured penalty along with $\ell_p$ norms in a prior in a Bayesian model improves spatial coherence in $T_1$ maps and enhances uncertainty quantification, offering a robust approach for $T_1$ mapping with uncertainties.

CVJul 21, 2023Code
LatentAugment: Data Augmentation via Guided Manipulation of GAN's Latent Space

Lorenzo Tronchin, Minh H. Vu, Paolo Soda et al.

Data Augmentation (DA) is a technique to increase the quantity and diversity of the training data, and by that alleviate overfitting and improve generalisation. However, standard DA produces synthetic data for augmentation with limited diversity. Generative Adversarial Networks (GANs) may unlock additional information in a dataset by generating synthetic samples having the appearance of real images. However, these models struggle to simultaneously address three key requirements: fidelity and high-quality samples; diversity and mode coverage; and fast sampling. Indeed, GANs generate high-quality samples rapidly, but have poor mode coverage, limiting their adoption in DA applications. We propose LatentAugment, a DA strategy that overcomes the low diversity of GANs, opening up for use in DA applications. Without external supervision, LatentAugment modifies latent vectors and moves them into latent space regions to maximise the synthetic images' diversity and fidelity. It is also agnostic to the dataset and the downstream task. A wide set of experiments shows that LatentAugment improves the generalisation of a deep model translating from MRI-to-CT beating both standard DA as well GAN-based sampling. Moreover, still in comparison with GAN-based sampling, LatentAugment synthetic samples show superior mode coverage and diversity. Code is available at: https://github.com/ltronchin/LatentAugment.

LGOct 20, 2022
Reproducibility of the Methods in Medical Imaging with Deep Learning

Attila Simko, Anders Garpebring, Joakim Jonsson et al.

Concerns about the reproducibility of deep learning research are more prominent than ever, with no clear solution in sight. The relevance of machine learning research can only be improved if we also employ empirical rigor that incorporates reproducibility guidelines, especially so in the medical imaging field. The Medical Imaging with Deep Learning (MIDL) conference has made advancements in this direction by advocating open access, and recently also recommending authors to make their code public - both aspects being adopted by the majority of the conference submissions. This helps the reproducibility of the methods, however, there is currently little or no support for further evaluation of these supplementary material, making them vulnerable to poor quality, which affects the impact of the entire submission. We have evaluated all accepted full paper submissions to MIDL between 2018 and 2022 using established, but slightly adjusted guidelines on reproducibility and the quality of the public repositories. The evaluations show that publishing repositories and using public datasets are becoming more popular, which helps traceability, but the quality of the repositories has not improved over the years, leaving room for improvement in every aspect of designing repositories. Merely 22% of all submissions contain a repository that were deemed repeatable using our evaluations. From the commonly encountered issues during the evaluations, we propose a set of guidelines for machine learning-related research for medical imaging applications, adjusted specifically for future submissions to MIDL.

CRSep 11, 2024
A Cost-Aware Approach to Adversarial Robustness in Neural Networks

Charles Meyers, Mohammad Reza Saleh Sedghpour, Tommy Löfstedt et al.

Considering the growing prominence of production-level AI and the threat of adversarial attacks that can evade a model at run-time, evaluating the robustness of models to these evasion attacks is of critical importance. Additionally, testing model changes likely means deploying the models to (e.g. a car or a medical imaging device), or a drone to see how it affects performance, making un-tested changes a public problem that reduces development speed, increases cost of development, and makes it difficult (if not impossible) to parse cause from effect. In this work, we used survival analysis as a cloud-native, time-efficient and precise method for predicting model performance in the presence of adversarial noise. For neural networks in particular, the relationships between the learning rate, batch size, training time, convergence time, and deployment cost are highly complex, so researchers generally rely on benchmark datasets to assess the ability of a model to generalize beyond the training data. To address this, we propose using accelerated failure time models to measure the effect of hardware choice, batch size, number of epochs, and test-set accuracy by using adversarial attacks to induce failures on a reference model architecture before deploying the model to the real world. We evaluate several GPU types and use the Tree Parzen Estimator to maximize model robustness and minimize model run-time simultaneously. This provides a way to evaluate the model and optimise it in a single step, while simultaneously allowing us to model the effect of model parameters on training time, prediction time, and accuracy. Using this technique, we demonstrate that newer, more-powerful hardware does decrease the training time, but with a monetary and power cost that far outpaces the marginal gains in accuracy.

CVMar 31, 2025
Beyond a Single Mode: GAN Ensembles for Diverse Medical Data Generation

Lorenzo Tronchin, Tommy Löfstedt, Paolo Soda et al.

The advancement of generative AI, particularly in medical imaging, confronts the trilemma of ensuring high fidelity, diversity, and efficiency in synthetic data generation. While Generative Adversarial Networks (GANs) have shown promise across various applications, they still face challenges like mode collapse and insufficient coverage of real data distributions. This work explores the use of GAN ensembles to overcome these limitations, specifically in the context of medical imaging. By solving a multi-objective optimisation problem that balances fidelity and diversity, we propose a method for selecting an optimal ensemble of GANs tailored for medical data. The selected ensemble is capable of generating diverse synthetic medical images that are representative of true data distributions and computationally efficient. Each model in the ensemble brings a unique contribution, ensuring minimal redundancy. We conducted a comprehensive evaluation using three distinct medical datasets, testing 22 different GAN architectures with various loss functions and regularisation techniques. By sampling models at different training epochs, we crafted 110 unique configurations. The results highlight the capability of GAN ensembles to enhance the quality and utility of synthetic medical images, thereby improving the efficacy of downstream tasks such as diagnostic modelling.

CVMar 2, 2025
Using Synthetic Images to Augment Small Medical Image Datasets

Minh H. Vu, Lorenzo Tronchin, Tufve Nyholm et al.

Recent years have witnessed a growing academic and industrial interest in deep learning (DL) for medical imaging. To perform well, DL models require very large labeled datasets. However, most medical imaging datasets are small, with a limited number of annotated samples. The reason they are small is usually because delineating medical images is time-consuming and demanding for oncologists. There are various techniques that can be used to augment a dataset, for example, to apply affine transformations or elastic transformations to available images, or to add synthetic images generated by a Generative Adversarial Network (GAN). In this work, we have developed a novel conditional variant of a current GAN method, the StyleGAN2, to generate multi-modal high-resolution medical images with the purpose to augment small medical imaging datasets with these synthetic images. We use the synthetic and real images from six datasets to train models for the downstream task of semantic segmentation. The quality of the generated medical images and the effect of this augmentation on the segmentation performance were evaluated afterward. Finally, the results indicate that the downstream segmentation models did not benefit from the generated images. Further work and analyses are required to establish how this augmentation affects the segmentation performance.

LGMar 6
Tiny, Hardware-Independent, Compression-based Classification

Charles Meyers, Aaron MacSween, Erik Elmroth et al.

The recent developments in machine learning have highlighted a conflict between online platforms and their users in terms of privacy. The importance of user privacy and the struggle for power over user data has been intensified as regulators and operators attempt to police online platforms. As users have become increasingly aware of privacy issues, client-side data storage, management, and analysis have become a favoured approach to large-scale centralised machine learning. However, state-of-the-art machine learning methods require vast amounts of labelled user data, making them unsuitable for models that reside client-side and only have access to a single user's data. State-of-the-art methods are also computationally expensive, which degrades the user experience on compute-limited hardware and also reduces battery life. A recent alternative approach has proven remarkably successful in classification tasks across a wide variety of data -- using a compression-based distance measure (called normalised compression distance) to measure the distance between generic objects in classical distance-based machine learning methods. In this work, we demonstrate that the normalised compression distance is actually not a metric; develop it for the wider context of kernel methods to allow modelling of complex data; and present techniques to improve the training time of models that use this distance measure. We demonstrate that the normalised compression distance works as well as and sometimes better than other metrics and kernels -- while requiring only marginally more computational costs and in spite of the lack of formal metric properties. The end results is a simple model with remarkable accuracy even when trained on a very small number of samples allowing for models that are small and effective enough to run entirely on a client device using only user-supplied data.

LGAug 16, 2025
Fairness Regularization in Federated Learning

Zahra Kharaghani, Ali Dadras, Tommy Löfstedt

Federated Learning (FL) has emerged as a vital paradigm in modern machine learning that enables collaborative training across decentralized data sources without exchanging raw data. This approach not only addresses privacy concerns but also allows access to overall substantially larger and potentially more diverse datasets, without the need for centralized storage or hardware resources. However, heterogeneity in client data may cause certain clients to have disproportionate impacts on the global model, leading to disparities in the clients' performances. Fairness, therefore, becomes a crucial concern in FL and can be addressed in various ways. However, the effectiveness of existing fairness-aware methods, particularly in heterogeneous data settings, remains unclear, and the relationships between different approaches are not well understood. In this work, we focus on performance equitable fairness, which aims to minimize differences in performance across clients. We restrict our study to fairness-aware methods that explicitly regularize client losses, evaluating both existing and newly proposed approaches. We identify and theoretically explain connections between the investigated fairness methods, and empirically show that FairGrad (approximate) and FairGrad* (exact) (two variants of a gradient variance regularization method introduced here for performance equitable fairness) improve both fairness and overall model performance in heterogeneous data settings.

LGMar 27, 2025
Provable Reduction in Communication Rounds for Non-Smooth Convex Federated Learning

Karlo Palenzuela, Ali Dadras, Alp Yurtsever et al.

Multiple local steps are key to communication-efficient federated learning. However, theoretical guarantees for such algorithms, without data heterogeneity-bounding assumptions, have been lacking in general non-smooth convex problems. Leveraging projection-efficient optimization methods, we propose FedMLS, a federated learning algorithm with provable improvements from multiple local steps. FedMLS attains an $ε$-suboptimal solution in $\mathcal{O}(1/ε)$ communication rounds, requiring a total of $\mathcal{O}(1/ε^2)$ stochastic subgradient oracle calls.

LGJan 24, 2024
A Training Rate and Survival Heuristic for Inference and Robustness Evaluation (TRASHFIRE)

Charles Meyers, Mohammad Reza Saleh Sedghpour, Tommy Löfstedt et al.

Machine learning models -- deep neural networks in particular -- have performed remarkably well on benchmark datasets across a wide variety of domains. However, the ease of finding adversarial counter-examples remains a persistent problem when training times are measured in hours or days and the time needed to find a successful adversarial counter-example is measured in seconds. Much work has gone into generating and defending against these adversarial counter-examples, however the relative costs of attacks and defences are rarely discussed. Additionally, machine learning research is almost entirely guided by test/train metrics, but these would require billions of samples to meet industry standards. The present work addresses the problem of understanding and predicting how particular model hyper-parameters influence the performance of a model in the presence of an adversary. The proposed approach uses survival models, worst-case examples, and a cost-aware analysis to precisely and accurately reject a particular model change during routine model training procedures rather than relying on real-world deployment, expensive formal verification methods, or accurate simulations of very complicated systems (\textit{e.g.}, digitally recreating every part of a car or a plane). Through an evaluation of many pre-processing techniques, adversarial counter-examples, and neural network configurations, the conclusion is that deeper models do offer marginal gains in survival times compared to more shallow counterparts. However, we show that those gains are driven more by the model inference time than inherent robustness properties. Using the proposed methodology, we show that ResNet is hopelessly insecure against even the simplest of white box attacks.

IVApr 22, 2021
A Data-Adaptive Loss Function for Incomplete Data and Incremental Learning in Semantic Image Segmentation

Minh H. Vu, Gabriella Norman, Tufve Nyholm et al.

In the last years, deep learning has dramatically improved the performances in a variety of medical image analysis applications. Among different types of deep learning models, convolutional neural networks have been among the most successful and they have been used in many applications in medical imaging. Training deep convolutional neural networks often requires large amounts of image data to generalize well to new unseen images. It is often time-consuming and expensive to collect large amounts of data in the medical image domain due to expensive imaging systems, and the need for experts to manually make ground truth annotations. A potential problem arises if new structures are added when a decision support system is already deployed and in use. Since the field of radiation therapy is constantly developing, the new structures would also have to be covered by the decision support system. In the present work, we propose a novel loss function, that adapts to the available data in order to utilize all available data, even when some have missing annotations. We demonstrate that the proposed loss function also works well in an incremental learning setting, where it can automatically incorporate new structures as they appear. Experiments on a large in-house data set show that the proposed method performs on par with baseline models, while greatly reducing the training time.

IVNov 16, 2020
Multi-Decoder Networks with Multi-Denoising Inputs for Tumor Segmentation

Minh H. Vu, Tufve Nyholm, Tommy Löfstedt

Automatic segmentation of brain glioma from multimodal MRI scans plays a key role in clinical trials and practice. Unfortunately, manual segmentation is very challenging, time-consuming, costly, and often inaccurate despite human expertise due to the high variance and high uncertainty in the human annotations. In the present work, we develop an end-to-end deep-learning-based segmentation method using a multi-decoder architecture by jointly learning three separate sub-problems using a partly shared encoder. We also propose to apply smoothing methods to the input images to generate denoised versions as additional inputs to the network. The validation performance indicate an improvement when using the proposed method. The proposed method was ranked 2nd in the task of Quantification of Uncertainty in Segmentation in the Brain Tumors in Multimodal Magnetic Resonance Imaging Challenge 2020.

CVMar 2, 2020
A Question-Centric Model for Visual Question Answering in Medical Imaging

Minh H. Vu, Tommy Löfstedt, Tufve Nyholm et al.

Deep learning methods have proven extremely effective at performing a variety of medical image analysis tasks. With their potential use in clinical routine, their lack of transparency has however been one of their few weak points, raising concerns regarding their behavior and failure modes. While most research to infer model behavior has focused on indirect strategies that estimate prediction uncertainties and visualize model support in the input image space, the ability to explicitly query a prediction model regarding its image content offers a more direct way to determine the behavior of trained models. To this end, we present a novel Visual Question Answering approach that allows an image to be queried by means of a written question. Experiments on a variety of medical and natural image datasets show that by fusing image and question features in a novel way, the proposed approach achieves an equal or higher accuracy compared to current methods.

IVDec 19, 2019
Evaluation of Multi-Slice Inputs to Convolutional Neural Networks for Medical Image Segmentation

Minh H. Vu, Guus Grimbergen, Tufve Nyholm et al.

When using Convolutional Neural Networks (CNNs) for segmentation of organs and lesions in medical images, the conventional approach is to work with inputs and outputs either as single slice (2D) or whole volumes (3D). One common alternative, in this study denoted as pseudo-3D, is to use a stack of adjacent slices as input and produce a prediction for at least the central slice. This approach gives the network the possibility to capture 3D spatial information, with only a minor additional computational cost. In this study, we systematically evaluate the segmentation performance and computational costs of this pseudo-3D approach as a function of the number of input slices, and compare the results to conventional end-to-end 2D and 3D CNNs. The standard pseudo-3D method regards the neighboring slices as multiple input image channels. We additionally evaluate a simple approach where the input stack is a volumetric input that is repeatably convolved in 3D to obtain a 2D feature map. This 2D map is in turn fed into a standard 2D network. We conducted experiments using two different CNN backbone architectures and on five diverse data sets covering different anatomical regions, imaging modalities, and segmentation tasks. We found that while both pseudo-3D methods can process a large number of slices at once and still be computationally much more efficient than fully 3D CNNs, a significant improvement over a regular 2D CNN was only observed for one of the five data sets. An analysis of the structural properties of the segmentation masks revealed no relations to the segmentation performance with respect to the number of input slices. The conclusion is therefore that in the general case, multi-slice inputs appear to not significantly improve segmentation results over using 2D or 3D CNNs.

IVOct 16, 2019
End-to-End Cascaded U-Nets with a Localization Network for Kidney Tumor Segmentation

Minh H. Vu, Guus Grimbergen, Attila Simkó et al.

Kidney tumor segmentation emerges as a new frontier of computer vision in medical imaging. This is partly due to its challenging manual annotation and great medical impact. Within the scope of the Kidney Tumor Segmentation Challenge 2019, that is aiming at combined kidney and tumor segmentation, this work proposes a novel combination of 3D U-Nets---collectively denoted TuNet---utilizing the resulting kidney masks for the consecutive tumor segmentation. The proposed method achieves a Sørensen-Dice coefficient score of 0.902 for the kidney, and 0.408 for the tumor segmentation, computed from a five-fold cross-validation on the 210 patients available in the data.

IVOct 11, 2019
TuNet: End-to-end Hierarchical Brain Tumor Segmentation using Cascaded Networks

Minh H. Vu, Tufve Nyholm, Tommy Löfstedt

Glioma is one of the most common types of brain tumors; it arises in the glial cells in the human brain and in the spinal cord. In addition to having a high mortality rate, glioma treatment is also very expensive. Hence, automatic and accurate segmentation and measurement from the early stages are critical in order to prolong the survival rates of the patients and to reduce the costs of the treatment. In the present work, we propose a novel end-to-end cascaded network for semantic segmentation that utilizes the hierarchical structure of the tumor sub-regions with ResNet-like blocks and Squeeze-and-Excitation modules after each convolution and concatenation block. By utilizing cross-validation, an average ensemble technique, and a simple post-processing technique, we obtained dice scores of 88.06, 80.84, and 80.29, and Hausdorff Distances (95th percentile) of 6.10, 5.17, and 2.21 for the whole tumor, tumor core, and enhancing tumor, respectively, on the online test set.

MLOct 29, 2016
A general multiblock method for structured variable selection

Tommy Löfstedt, Fouad Hadj-Selem, Vincent Guillemot et al.

Regularised canonical correlation analysis was recently extended to more than two sets of variables by the multiblock method Regularised generalised canonical correlation analysis (RGCCA). Further, Sparse GCCA (SGCCA) was proposed to address the issue of variable selection. However, for technical reasons, the variable selection offered by SGCCA was restricted to a covariance link between the blocks (i.e., with $τ=1$). One of the main contributions of this paper is to go beyond the covariance link and to propose an extension of SGCCA for the full RGCCA model (i.e., with $τ\in[0, 1]$). In addition, we propose an extension of SGCCA that exploits structural relationships between variables within blocks. Specifically, we propose an algorithm that allows structured and sparsity-inducing penalties to be included in the RGCCA optimisation problem. The proposed multiblock method is illustrated on a real three-block high-grade glioma data set, where the aim is to predict the location of the brain tumours, and on a simulated data set, where the aim is to illustrate the method's ability to reconstruct the true underlying weight vectors.

MLSep 6, 2016
Structured Sparse Principal Components Analysis with the TV-Elastic Net penalty

Amicie de Pierrefeu, Tommy Löfstedt, Fouad Hadj-Selem et al.

Principal component analysis (PCA) is an exploratory tool widely used in data analysis to uncover dominant patterns of variability within a population. Despite its ability to represent a data set in a low-dimensional space, the interpretability of PCA remains limited. However, in neuroimaging, it is essential to uncover clinically interpretable phenotypic markers that would account for the main variability in the brain images of a population. Recently, some alternatives to the standard PCA approach, such as Sparse PCA, have been proposed, their aim being to limit the density of the components. Nonetheless, sparsity alone does not entirely solve the interpretability problem, since it may yield scattered and unstable components. We hypothesized that the incorporation of prior information regarding the structure of the data may lead to improved relevance and interpretability of brain patterns. We therefore present a simple extension of the popular PCA framework that adds structured sparsity penalties on the loading vectors in order to identify the few stable regions in the brain images accounting for most of the variability. Such structured sparsity can be obtained by combining l1 and total variation (TV) penalties, where the TV regularization encodes higher order information about the structure of the data. This paper presents the structured sparse PCA (denoted SPCA-TV) optimization framework and its resolution. We demonstrate the efficiency and versatility of SPCA-TV on three different data sets. The gains of SPCA-TV over unstructured approaches are significant,since SPCA-TV reveals the variability within a data set in the form of intelligible brain patterns that are easy to interpret, and are more stable across different samples.