Thomas Höllt

HC
h-index41
7papers
318citations
Novelty36%
AI Score26

7 Papers

LGAug 29, 2023
Navigating Perplexity: A linear relationship with the data set size in t-SNE embeddings

Martin Skrodzki, Nicolas F. Chaves-de-Plaza, Thomas Höllt et al.

Widely used pipelines for analyzing high-dimensional data utilize two-dimensional visualizations. These are created, for instance, via t-distributed stochastic neighbor embedding (t-SNE). A crucial element of the t-SNE embedding procedure is the perplexity hyperparameter. That is because the embedding structure varies when perplexity is changed. A suitable perplexity choice depends on the data set and the intended usage for the embedding. Therefore, perplexity is often chosen based on heuristics, intuition, and prior experience. This paper uncovers a linear relationship between perplexity and the data set size. Namely, we show that embeddings remain structurally consistent across data set samples when perplexity is adjusted accordingly. Qualitative and quantitative experimental results support these findings. This informs the visualization process, guiding the user when choosing a perplexity value. Finally, we outline several applications for the visualization of high-dimensional data via t-SNE based on this linear relationship.

HCJan 23, 2024
Accelerating hyperbolic t-SNE

Martin Skrodzki, Hunter van Geffen, Nicolas F. Chaves-de-Plaza et al.

The need to understand the structure of hierarchical or high-dimensional data is present in a variety of fields. Hyperbolic spaces have proven to be an important tool for embedding computations and analysis tasks as their non-linear nature lends itself well to tree or graph data. Subsequently, they have also been used in the visualization of high-dimensional data, where they exhibit increased embedding performance. However, none of the existing dimensionality reduction methods for embedding into hyperbolic spaces scale well with the size of the input data. That is because the embeddings are computed via iterative optimization schemes and the computation cost of every iteration is quadratic in the size of the input. Furthermore, due to the non-linear nature of hyperbolic spaces, Euclidean acceleration structures cannot directly be translated to the hyperbolic setting. This paper introduces the first acceleration structure for hyperbolic embeddings, building upon a polar quadtree. We compare our approach with existing methods and demonstrate that it computes embeddings of similar quality in significantly less time. Implementation and scripts for the experiments can be found at https://graphics.tudelft.nl/accelerating-hyperbolic-tsne.

HCDec 20, 2024
AI-in-the-loop: The future of biomedical visual analytics applications in the era of AI

Katja Bühler, Thomas Höllt, Thomas Schulz et al.

AI is the workhorse of modern data analytics and omnipresent across many sectors. Large Language Models and multi-modal foundation models are today capable of generating code, charts, visualizations, etc. How will these massive developments of AI in data analytics shape future data visualizations and visual analytics workflows? What is the potential of AI to reshape methodology and design of future visual analytics applications? What will be our role as visualization researchers in the future? What are opportunities, open challenges and threats in the context of an increasingly powerful AI? This Visualization Viewpoint discusses these questions in the special context of biomedical data analytics as an example of a domain in which critical decisions are taken based on complex and sensitive data, with high requirements on transparency, efficiency, and reliability. We map recent trends and developments in AI on the elements of interactive visualization and visual analytics workflows and highlight the potential of AI to transform biomedical visualization as a research field. Given that agency and responsibility have to remain with human experts, we argue that it is helpful to keep the focus on human-centered workflows, and to use visual analytics as a tool for integrating ``AI-in-the-loop''. This is in contrast to the more traditional term ``human-in-the-loop'', which focuses on incorporating human expertise into AI-based systems.

CVFeb 18, 2022
Incorporating Texture Information into Dimensionality Reduction for High-Dimensional Images

Alexander Vieth, Anna Vilanova, Boudewijn Lelieveldt et al.

High-dimensional imaging is becoming increasingly relevant in many fields from astronomy and cultural heritage to systems biology. Visual exploration of such high-dimensional data is commonly facilitated by dimensionality reduction. However, common dimensionality reduction methods do not include spatial information present in images, such as local texture features, into the construction of low-dimensional embeddings. Consequently, exploration of such data is typically split into a step focusing on the attribute space followed by a step focusing on spatial information, or vice versa. In this paper, we present a method for incorporating spatial neighborhood information into distance-based dimensionality reduction methods, such as t-Distributed Stochastic Neighbor Embedding (t-SNE). We achieve this by modifying the distance measure between high-dimensional attribute vectors associated with each pixel such that it takes the pixel's spatial neighborhood into account. Based on a classification of different methods for comparing image patches, we explore a number of different approaches. We compare these approaches from a theoretical and experimental point of view. Finally, we illustrate the value of the proposed methods by qualitative and quantitative evaluation on synthetic data and two real-world use cases.

GRJul 22, 2020
InCorr: Interactive Data-Driven Correlation Panels for Digital Outcrop Analysis

Thomas Ortner, Andreas Walch, Rebecca Nowak et al.

Geological analysis of 3D Digital Outcrop Models (DOMs) for reconstruction of ancient habitable environments is a key aspect of the upcoming ESA ExoMars 2022 Rosalind Franklin Rover and the NASA 2020 Rover Perseverance missions in seeking signs of past life on Mars. Geologists measure and interpret 3D DOMs, create sedimentary logs and combine them in `correlation panels' to map the extents of key geological horizons, and build a stratigraphic model to understand their position in the ancient landscape. Currently, the creation of correlation panels is completely manual and therefore time-consuming, and inflexible. With InCorr we present a visualization solution that encompasses a 3D logging tool and an interactive data-driven correlation panel that evolves with the stratigraphic analysis. For the creation of InCorr we closely cooperated with leading planetary geologists in the form of a design study. We verify our results by recreating an existing correlation analysis with InCorr and validate our correlation panel against a manually created illustration. Further, we conducted a user-study with a wider circle of geologists. Our evaluation shows that InCorr efficiently supports the domain experts in tackling their research questions and that it has the potential to significantly impact how geologists work with digital outcrop representations in general.

HCJun 9, 2020
Visual cohort comparison for spatial single-cell omics-data

Antonios Somarakis, Marieke E. Ijsselsteijn, Sietse J. Luk et al.

Spatially-resolved omics-data enable researchers to precisely distinguish cell types in tissue and explore their spatial interactions, enabling deep understanding of tissue functionality. To understand what causes or deteriorates a disease and identify related biomarkers, clinical researchers regularly perform large-scale cohort studies, requiring the comparison of such data at cellular level. In such studies, with little a-priori knowledge of what to expect in the data, explorative data analysis is a necessity. Here, we present an interactive visual analysis workflow for the comparison of cohorts of spatially-resolved omics-data. Our workflow allows the comparative analysis of two cohorts based on multiple levels-of-detail, from simple abundance of contained cell types over complex co-localization patterns to individual comparison of complete tissue images. As a result, the workflow enables the identification of cohort-differentiating features, as well as outlier samples at any stage of the workflow. During the development of the workflow, we continuously consulted with domain experts. To show the effectiveness of the workflow we conducted multiple case studies with domain experts from different application areas and with different data modalities.

CVDec 5, 2015
Approximated and User Steerable tSNE for Progressive Visual Analytics

Nicola Pezzotti, Boudewijn P. F. Lelieveldt, Laurens van der Maaten et al.

Progressive Visual Analytics aims at improving the interactivity in existing analytics techniques by means of visualization as well as interaction with intermediate results. One key method for data analysis is dimensionality reduction, for example, to produce 2D embeddings that can be visualized and analyzed efficiently. t-Distributed Stochastic Neighbor Embedding (tSNE) is a well-suited technique for the visualization of several high-dimensional data. tSNE can create meaningful intermediate results but suffers from a slow initialization that constrains its application in Progressive Visual Analytics. We introduce a controllable tSNE approximation (A-tSNE), which trades off speed and accuracy, to enable interactive data exploration. We offer real-time visualization techniques, including a density-based solution and a Magic Lens to inspect the degree of approximation. With this feedback, the user can decide on local refinements and steer the approximation level during the analysis. We demonstrate our technique with several datasets, in a real-world research scenario and for the real-time analysis of high-dimensional streams to illustrate its effectiveness for interactive data analysis.