Thomas Höllt

h-index33

10papers

336citations

Novelty41%

AI Score43

Ranked #52,825 of 194,257 authors (top 27%)#18,375 in CV (top 31%)

10 Papers

2.0LGAug 29, 2023

Navigating Perplexity: A linear relationship with the data set size in t-SNE embeddings

Martin Skrodzki, Nicolas F. Chaves-de-Plaza, Thomas Höllt et al.

Widely used pipelines for analyzing high-dimensional data utilize two-dimensional visualizations. These are created, for instance, via t-distributed stochastic neighbor embedding (t-SNE). A crucial element of the t-SNE embedding procedure is the perplexity hyperparameter. That is because the embedding structure varies when perplexity is changed. A suitable perplexity choice depends on the data set and the intended usage for the embedding. Therefore, perplexity is often chosen based on heuristics, intuition, and prior experience. This paper uncovers a linear relationship between perplexity and the data set size. Namely, we show that embeddings remain structurally consistent across data set samples when perplexity is adjusted accordingly. Qualitative and quantitative experimental results support these findings. This informs the visualization process, guiding the user when choosing a perplexity value. Finally, we outline several applications for the visualization of high-dimensional data via t-SNE based on this linear relationship.

7.3LGApr 2

A Spectral Framework for Multi-Scale Nonlinear Dimensionality Reduction

Zeyang Huang, Angelos Chatzimparmpas, Thomas Höllt et al.

Dimensionality reduction (DR) is characterized by two longstanding trade-offs. First, there is a global-local preservation tension: methods such as t-SNE and UMAP prioritize local neighborhood preservation, yet may distort global manifold structure, while methods such as Laplacian Eigenmaps preserve global geometry but often yield limited local separation. Second, there is a gap between expressiveness and analytical transparency: many nonlinear DR methods produce embeddings without an explicit connection to the underlying high-dimensional structure, limiting insight into the embedding process. In this paper, we introduce a spectral framework for nonlinear DR that addresses these challenges. Our approach embeds high-dimensional data using a spectral basis combined with cross-entropy optimization, enabling multi-scale representations that bridge global and local structure. Leveraging linear spectral decomposition, the framework further supports analysis of embeddings through a graph-frequency perspective, enabling examination of how spectral modes influence the resulting embedding. We complement this analysis with glyph-based scatterplot augmentations for visual exploration. Quantitative evaluations and case studies demonstrate that our framework improves manifold continuity while enabling deeper analysis of embedding structure through spectral mode contributions.

6.7HCJan 23, 2024

Accelerating hyperbolic t-SNE

Martin Skrodzki, Hunter van Geffen, Nicolas F. Chaves-de-Plaza et al.

The need to understand the structure of hierarchical or high-dimensional data is present in a variety of fields. Hyperbolic spaces have proven to be an important tool for embedding computations and analysis tasks as their non-linear nature lends itself well to tree or graph data. Subsequently, they have also been used in the visualization of high-dimensional data, where they exhibit increased embedding performance. However, none of the existing dimensionality reduction methods for embedding into hyperbolic spaces scale well with the size of the input data. That is because the embeddings are computed via iterative optimization schemes and the computation cost of every iteration is quadratic in the size of the input. Furthermore, due to the non-linear nature of hyperbolic spaces, Euclidean acceleration structures cannot directly be translated to the hyperbolic setting. This paper introduces the first acceleration structure for hyperbolic embeddings, building upon a polar quadtree. We compare our approach with existing methods and demonstrate that it computes embeddings of similar quality in significantly less time. Implementation and scripts for the experiments can be found at https://graphics.tudelft.nl/accelerating-hyperbolic-tsne.

1.5CVFeb 27

Manifold-Preserving Superpixel Hierarchies and Embeddings for the Exploration of High-Dimensional Images

Alexander Vieth, Boudewijn Lelieveldt, Elmar Eisemann et al.

High-dimensional images, or images with a high-dimensional attribute vector per pixel, are commonly explored with coordinated views of a low-dimensional embedding of the attribute space and a conventional image representation. Nowadays, such images can easily contain several million pixels. For such large datasets, hierarchical embedding techniques are better suited to represent the high-dimensional attribute space than flat dimensionality reduction methods. However, available hierarchical dimensionality reduction methods construct the hierarchy purely based on the attribute information and ignore the spatial layout of pixels in the images. This impedes the exploration of regions of interest in the image space, since there is no congruence between a region of interest in image space and the associated attribute abstractions in the hierarchy. In this paper, we present a superpixel hierarchy for high-dimensional images that takes the high-dimensional attribute manifold into account during construction. Through this, our method enables consistent exploration of high-dimensional images in both image and attribute space. We show the effectiveness of this new image-guided hierarchy in the context of embedding exploration by comparing it with classical hierarchical embedding-based image exploration in two use cases.

2.7HCDec 20, 2024

AI-in-the-loop: The future of biomedical visual analytics applications in the era of AI

Katja Bühler, Thomas Höllt, Thomas Schulz et al.

AI is the workhorse of modern data analytics and omnipresent across many sectors. Large Language Models and multi-modal foundation models are today capable of generating code, charts, visualizations, etc. How will these massive developments of AI in data analytics shape future data visualizations and visual analytics workflows? What is the potential of AI to reshape methodology and design of future visual analytics applications? What will be our role as visualization researchers in the future? What are opportunities, open challenges and threats in the context of an increasingly powerful AI? This Visualization Viewpoint discusses these questions in the special context of biomedical data analytics as an example of a domain in which critical decisions are taken based on complex and sensitive data, with high requirements on transparency, efficiency, and reliability. We map recent trends and developments in AI on the elements of interactive visualization and visual analytics workflows and highlight the potential of AI to transform biomedical visualization as a research field. Given that agency and responsibility have to remain with human experts, we argue that it is helpful to keep the focus on human-centered workflows, and to use visual analytics as a tool for integrating ``AI-in-the-loop''. This is in contrast to the more traditional term ``human-in-the-loop'', which focuses on incorporating human expertise into AI-based systems.

2.6CVFeb 18, 2022Code

Incorporating Texture Information into Dimensionality Reduction for High-Dimensional Images

Alexander Vieth, Anna Vilanova, Boudewijn Lelieveldt et al.

High-dimensional imaging is becoming increasingly relevant in many fields from astronomy and cultural heritage to systems biology. Visual exploration of such high-dimensional data is commonly facilitated by dimensionality reduction. However, common dimensionality reduction methods do not include spatial information present in images, such as local texture features, into the construction of low-dimensional embeddings. Consequently, exploration of such data is typically split into a step focusing on the attribute space followed by a step focusing on spatial information, or vice versa. In this paper, we present a method for incorporating spatial neighborhood information into distance-based dimensionality reduction methods, such as t-Distributed Stochastic Neighbor Embedding (t-SNE). We achieve this by modifying the distance measure between high-dimensional attribute vectors associated with each pixel such that it takes the pixel's spatial neighborhood into account. Based on a classification of different methods for comparing image patches, we explore a number of different approaches. We compare these approaches from a theoretical and experimental point of view. Finally, we illustrate the value of the proposed methods by qualitative and quantitative evaluation on synthetic data and two real-world use cases.

1.2GRJul 22, 2020

InCorr: Interactive Data-Driven Correlation Panels for Digital Outcrop Analysis

Thomas Ortner, Andreas Walch, Rebecca Nowak et al.

Geological analysis of 3D Digital Outcrop Models (DOMs) for reconstruction of ancient habitable environments is a key aspect of the upcoming ESA ExoMars 2022 Rosalind Franklin Rover and the NASA 2020 Rover Perseverance missions in seeking signs of past life on Mars. Geologists measure and interpret 3D DOMs, create sedimentary logs and combine them in `correlation panels' to map the extents of key geological horizons, and build a stratigraphic model to understand their position in the ancient landscape. Currently, the creation of correlation panels is completely manual and therefore time-consuming, and inflexible. With InCorr we present a visualization solution that encompasses a 3D logging tool and an interactive data-driven correlation panel that evolves with the stratigraphic analysis. For the creation of InCorr we closely cooperated with leading planetary geologists in the form of a design study. We verify our results by recreating an existing correlation analysis with InCorr and validate our correlation panel against a manually created illustration. Further, we conducted a user-study with a wider circle of geologists. Our evaluation shows that InCorr efficiently supports the domain experts in tackling their research questions and that it has the potential to significantly impact how geologists work with digital outcrop representations in general.

7.9HCJun 9, 2020

Visual cohort comparison for spatial single-cell omics-data

Antonios Somarakis, Marieke E. Ijsselsteijn, Sietse J. Luk et al.

Spatially-resolved omics-data enable researchers to precisely distinguish cell types in tissue and explore their spatial interactions, enabling deep understanding of tissue functionality. To understand what causes or deteriorates a disease and identify related biomarkers, clinical researchers regularly perform large-scale cohort studies, requiring the comparison of such data at cellular level. In such studies, with little a-priori knowledge of what to expect in the data, explorative data analysis is a necessity. Here, we present an interactive visual analysis workflow for the comparison of cohorts of spatially-resolved omics-data. Our workflow allows the comparative analysis of two cohorts based on multiple levels-of-detail, from simple abundance of contained cell types over complex co-localization patterns to individual comparison of complete tissue images. As a result, the workflow enables the identification of cohort-differentiating features, as well as outlier samples at any stage of the workflow. During the development of the workflow, we continuously consulted with domain experts. To show the effectiveness of the workflow we conducted multiple case studies with domain experts from different application areas and with different data modalities.

11.1LGMay 28, 2018Code

GPGPU Linear Complexity t-SNE Optimization

Nicola Pezzotti, Julian Thijssen, Alexander Mordvintsev et al.

The t-distributed Stochastic Neighbor Embedding (tSNE) algorithm has become in recent years one of the most used and insightful techniques for the exploratory data analysis of high-dimensional data. tSNE reveals clusters of high-dimensional data points at different scales while it requires only minimal tuning of its parameters. Despite these advantages, the computational complexity of the algorithm limits its application to relatively small datasets. To address this problem, several evolutions of tSNE have been developed in recent years, mainly focusing on the scalability of the similarity computations between data points. However, these contributions are insufficient to achieve interactive rates when visualizing the evolution of the tSNE embedding for large datasets. In this work, we present a novel approach to the minimization of the tSNE objective function that heavily relies on modern graphics hardware and has linear computational complexity. Our technique does not only beat the state of the art, but can even be executed on the client side in a browser. We propose to approximate the repulsion forces between data points using adaptive-resolution textures that are drawn at every iteration with WebGL. This approximation allows us to reformulate the tSNE minimization problem as a series of tensor operation that are computed with TensorFlow.js, a JavaScript library for scalable tensor computations.

17.1CVDec 5, 2015

Approximated and User Steerable tSNE for Progressive Visual Analytics

Nicola Pezzotti, Boudewijn P. F. Lelieveldt, Laurens van der Maaten et al.

Progressive Visual Analytics aims at improving the interactivity in existing analytics techniques by means of visualization as well as interaction with intermediate results. One key method for data analysis is dimensionality reduction, for example, to produce 2D embeddings that can be visualized and analyzed efficiently. t-Distributed Stochastic Neighbor Embedding (tSNE) is a well-suited technique for the visualization of several high-dimensional data. tSNE can create meaningful intermediate results but suffers from a slow initialization that constrains its application in Progressive Visual Analytics. We introduce a controllable tSNE approximation (A-tSNE), which trades off speed and accuracy, to enable interactive data exploration. We offer real-time visualization techniques, including a density-based solution and a Magic Lens to inspect the degree of approximation. With this feedback, the user can decide on local refinements and steer the approximation level during the analysis. We demonstrate our technique with several datasets, in a real-world research scenario and for the real-time analysis of high-dimensional streams to illustrate its effectiveness for interactive data analysis.