LGAug 21, 2025
Low-dimensional embeddings of high-dimensional dataCyril de Bodt, Alex Diaz-Papkovich, Michael Bleher et al.
Large collections of high-dimensional data have become nearly ubiquitous across many academic fields and application domains, ranging from biology to the humanities. Since working directly with high-dimensional data poses challenges, the demand for algorithms that create low-dimensional representations, or embeddings, for data visualization, exploration, and analysis is now greater than ever. In recent years, numerous embedding algorithms have been developed, and their usage has become widespread in research and industry. This surge of interest has resulted in a large and fragmented research field that faces technical challenges alongside fundamental debates, and it has left practitioners without clear guidance on how to effectively employ existing methods. Aiming to increase coherence and facilitate future work, in this review we provide a detailed and critical overview of recent developments, derive a list of best practices for creating and using low-dimensional embeddings, evaluate popular approaches on a variety of datasets, and discuss the remaining challenges and open problems in the field.
LGOct 3, 2020
Perplexity-free Parametric t-SNEFrancesco Crecchi, Cyril de Bodt, Michel Verleysen et al.
The t-distributed Stochastic Neighbor Embedding (t-SNE) algorithm is a ubiquitously employed dimensionality reduction (DR) method. Its non-parametric nature and impressive efficacy motivated its parametric extension. It is however bounded to a user-defined perplexity parameter, restricting its DR quality compared to recently developed multi-scale perplexity-free approaches. This paper hence proposes a multi-scale parametric t-SNE scheme, relieved from the perplexity tuning and with a deep neural network implementing the mapping. It produces reliable embeddings with out-of-sample extensions, competitive with the best perplexity adjustments in terms of neighborhood preservation on multiple data sets.
CVSep 2, 2020
Deep Learning to Detect Bacterial Colonies for the Production of VaccinesThomas Beznik, Paul Smyth, Gaël de Lannoy et al.
During the development of vaccines, bacterial colony forming units (CFUs) are counted in order to quantify the yield in the fermentation process. This manual task is time-consuming and error-prone. In this work we test multiple segmentation algorithms based on the U-Net CNN architecture and show that these offer robust, automated CFU counting. We show that the multiclass generalisation with a bespoke loss function allows distinguishing virulent and avirulent colonies with acceptable accuracy. While many possibilities are left to explore, our results show the potential of deep learning for separating and classifying bacterial colonies.
CVDec 3, 2018
Knowing what you know in brain segmentation using Bayesian deep neural networksPatrick McClure, Nao Rho, John A. Lee et al.
In this paper, we describe a Bayesian deep neural network (DNN) for predicting FreeSurfer segmentations of structural MRI volumes, in minutes rather than hours. The network was trained and evaluated on a large dataset (n = 11,480), obtained by combining data from more than a hundred different sites, and also evaluated on another completely held-out dataset (n = 418). The network was trained using a novel spike-and-slab dropout-based variational inference approach. We show that, on these datasets, the proposed Bayesian DNN outperforms previously proposed methods, in terms of the similarity between the segmentation predictions and the FreeSurfer labels, and the usefulness of the estimate uncertainty of these predictions. In particular, we demonstrated that the prediction uncertainty of this network at each voxel is a good indicator of whether the network has made an error and that the uncertainty across the whole brain can predict the manual quality control ratings of a scan. The proposed Bayesian DNN method should be applicable to any new network architecture for addressing the segmentation problem.
CVMay 29, 2018
Capturing Variabilities from Computed Tomography Images with Generative Adversarial NetworksUmair Javaid, John A. Lee
With the advent of Deep Learning (DL) techniques, especially Generative Adversarial Networks (GANs), data augmentation and generation are quickly evolving domains that have raised much interest recently. However, the DL techniques are data demanding and since, medical data is not easily accessible, they suffer from data insufficiency. To deal with this limitation, different data augmentation techniques are used. Here, we propose a novel unsupervised data-driven approach for data augmentation that can generate 2D Computed Tomography (CT) images using a simple GAN. The generated CT images have good global and local features of a real CT image and can be used to augment the training datasets for effective learning. In this proof-of-concept study, we show that our proposed solution using GANs is able to capture some of the global and local CT variabilities. Our network is able to generate visually realistic CT images and we aim to further enhance its output by scaling it to a higher resolution and potentially from 2D to 3D.
LGMay 28, 2018
Distributed Weight Consolidation: A Brain Segmentation Case StudyPatrick McClure, Charles Y. Zheng, Jakub R. Kaczmarzyk et al.
Collecting the large datasets needed to train deep neural networks can be very difficult, particularly for the many applications for which sharing and pooling data is complicated by practical, ethical, or legal concerns. However, it may be the case that derivative datasets or predictive models developed within individual sites can be shared and combined with fewer restrictions. Training on distributed data and combining the resulting networks is often viewed as continual learning, but these methods require networks to be trained sequentially. In this paper, we introduce distributed weight consolidation (DWC), a continual learning method to consolidate the weights of separate neural networks, each trained on an independent dataset. We evaluated DWC with a brain segmentation case study, where we consolidated dilated convolutional neural networks trained on independent structural magnetic resonance imaging (sMRI) datasets from different sites. We found that DWC led to increased performance on test sets from the different sites, while maintaining generalization performance for a very large and completely independent multi-site dataset, compared to an ensemble baseline.
CVAug 5, 2016
Blind Deconvolution of PET Images using Anatomical PriorsStéphanie Guérit, Adriana González, Anne Bol et al.
Images from positron emission tomography (PET) provide metabolic information about the human body. They present, however, a spatial resolution that is limited by physical and instrumental factors often modeled by a blurring function. Since this function is typically unknown, blind deconvolution (BD) techniques are needed in order to produce a useful restored PET image. In this work, we propose a general BD technique that restores a low resolution blurry image using information from data acquired with a high resolution modality (e.g., CT-based delineation of regions with uniform activity in PET images). The proposed BD method is validated on synthetic and actual phantoms.
CVJun 16, 2015
Post-Reconstruction Deconvolution of PET Images by Total Generalized Variation RegularizationStéphanie Guérit, Laurent Jacques, Benoît Macq et al.
Improving the quality of positron emission tomography (PET) images, affected by low resolution and high level of noise, is a challenging task in nuclear medicine and radiotherapy. This work proposes a restoration method, achieved after tomographic reconstruction of the images and targeting clinical situations where raw data are often not accessible. Based on inverse problem methods, our contribution introduces the recently developed total generalized variation (TGV) norm to regularize PET image deconvolution. Moreover, we stabilize this procedure with additional image constraints such as positivity and photometry invariance. A criterion for updating and adjusting automatically the regularization parameter in case of Poisson noise is also presented. Experiments are conducted on both synthetic data and real patient images.