Marc Streit

HC
12papers
231citations
Novelty37%
AI Score45

12 Papers

LGOct 10, 2022
ParaDime: A Framework for Parametric Dimensionality Reduction

Andreas Hinterreiter, Christina Humer, Bernhard Kainz et al.

ParaDime is a framework for parametric dimensionality reduction (DR). In parametric DR, neural networks are trained to embed high-dimensional data items in a low-dimensional space while minimizing an objective function. ParaDime builds on the idea that the objective functions of several modern DR techniques result from transformed inter-item relationships. It provides a common interface for specifying these relations and transformations and for defining how they are used within the losses that govern the training process. Through this interface, ParaDime unifies parametric versions of DR techniques such as metric MDS, t-SNE, and UMAP. It allows users to fully customize all aspects of the DR process. We show how this ease of customization makes ParaDime suitable for experimenting with interesting techniques such as hybrid classification/embedding models and supervised DR. This way, ParaDime opens up new possibilities for visualizing high-dimensional data.

37.4AIMay 7
Visual Fingerprints for LLM Generation Comparison

Amal Alnouri, Andreas Hinterreiter, Christina Humer et al.

Large language model (LLM) outputs arise from complex interactions among prompts, system instructions, model parameters, and architecture. We refer to specific configurations of these factors as generation conditions, each of which can bias outputs in various ways. Understanding how different generation conditions shape model behaviors is essential for tasks such as prompt design and model evaluation, yet it remains challenging due to the stochastic and open-ended nature of text generation. We present an approach to visually compare LLM outputs across generation conditions by modeling responses as collections of linguistic choices, including content, expression, and structure. We extract these choices using natural language processing pipelines and represent their distributions across repeated samples. We then visualize these distributions as visual fingerprints, enabling direct, distribution-level comparison of condition-specific tendencies. Through four usage scenarios, we demonstrate how visual fingerprints reveal consistent patterns in LLM behavior that are difficult to observe through individual responses or aggregate metrics.

18.9HCMay 7
EventColumn: Integrating Event Sequences into Tabular Visualizations

Jakob Zethofer, Andreas Hinterreiter, Lukas Schiefermüller et al.

We introduce EventColumn, a new column type that integrates event-sequence data with heterogeneous tabular attributes into a single unified table. EventColumn lets analysts compare event sequences alongside numerical, categorical, and temporal attributes at both instance and group levels, offering a compressed overview, heatmap group summaries, alignment by event types, and boxplots of similar historical items. We developed EventColumn together with collaborators from the steel industry to facilitate the analysis of production events and warehouse logistics, but the solution generalizes to a wide range of event sequence datasets with additional tabular attributes. Unlike most existing approaches that compare either event sequences or tables, EventColumn supports simultaneous comparison of both. We demonstrate its integration with Taggle and Microsoft Power BI on data from steel production logistics and on a public e-commerce dataset.

35.7LGApr 23
GFlowState: Visualizing the Training of Generative Flow Networks Beyond the Reward

Florian Holeczek, Andreas Hinterreiter, Alex Hernandez-Garcia et al.

We present GFlowState, a visual analytics system designed to illuminate the training process of Generative Flow Networks (GFlowNets or GFNs). GFlowNets are a probabilistic framework for generating samples proportionally to a reward function. While GFlowNets have proved to be powerful tools in applications such as molecule and material discovery, their training dynamics remain difficult to interpret. Standard machine learning tools allow metric tracking but do not reveal how models explore the sample space, construct sample trajectories, or shift sampling probabilities during training. Our solution, GFlowState, allows users to analyze sampling trajectories, compare the sample space relative to reference datasets, and analyze the training dynamics. To this end, we introduce multiple views, including a chart of candidate rankings, a state projection, a node-link diagram of the trajectory network, and a transition heatmap. These visualizations enable GFlowNet developers and users to investigate sampling behavior and policy evolution, and to identify underexplored regions and sources of training failure. Case studies demonstrate how the system supports debugging and assessing the quality of GFlowNets across application domains. By making the structural dynamics of GFlowNets observable, our work enhances their interpretability and can accelerate GFlowNet development in practice.

HCFeb 4, 2022
Perspectives of Visualization Onboarding and Guidance in VA

Christina Stoiber, Davide Ceneda, Markus Wagner et al.

A typical problem in Visual Analytics is that users are highly trained experts in their application domains, but have mostly no experience in using VA systems. Thus, users often have difficulties interpreting and working with visual representations. To overcome these problems, user assistance can be incorporated into VA systems to guide experts through the analysis while closing their knowledge gaps. Different types of user assistance can be applied to extend the power of VA, enhance the user's experience, and broaden the audience for VA. Although different approaches to visualization onboarding and guidance in VA already exist, there is a lack of research on how to design and integrate them in effective and efficient ways. Therefore, we aim at putting together the pieces of the mosaic to form a coherent whole. Based on the Knowledge-Assisted Visual Analytics model, we contribute a conceptual model of user assistance for VA by integrating the process of visualization onboarding and guidance as the two main approaches in this direction. As a result, we clarify and discuss the commonalities and differences between visualization onboarding and guidance, and discuss how they benefit from the integration of knowledge extraction and exploration. Finally, we discuss our descriptive model by applying it to VA tools integrating visualization onboarding and guidance, and showing how they should be utilized in different phases of the analysis in order to be effective and accepted by the user.

LGJul 22, 2020
InstanceFlow: Visualizing the Evolution of Classifier Confusion on the Instance Level

Michael Pühringer, Andreas Hinterreiter, Marc Streit

Classification is one of the most important supervised machine learning tasks. During the training of a classification model, the training instances are fed to the model multiple times (during multiple epochs) in order to iteratively increase the classification performance. The increasing complexity of models has led to a growing demand for model interpretability through visualizations. Existing approaches mostly focus on the visual analysis of the final model performance after training and are often limited to aggregate performance measures. In this paper we introduce InstanceFlow, a novel dual-view visualization tool that allows users to analyze the learning behavior of classifiers over time on the instance-level. A Sankey diagram visualizes the flow of instances throughout epochs, with on-demand detailed glyphs and traces for individual instances. A tabular view allows users to locate interesting instances by ranking and filtering. In this way, InstanceFlow bridges the gap between class-level and instance-level performance evaluation while enabling users to perform a full temporal analysis of the training process.

LGJun 23, 2020
Projective Latent Interventions for Understanding and Fine-tuning Classifiers

Andreas Hinterreiter, Marc Streit, Bernhard Kainz

High-dimensional latent representations learned by neural network classifiers are notoriously hard to interpret. Especially in medical applications, model developers and domain experts desire a better understanding of how these latent representations relate to the resulting classification performance. We present Projective Latent Interventions (PLIs), a technique for retraining classifiers by back-propagating manual changes made to low-dimensional embeddings of the latent space. The back-propagation is based on parametric approximations of t-distributed stochastic neighbourhood embeddings. PLIs allow domain experts to control the latent decision space in an intuitive way in order to better match their expectations. For instance, the performance for specific pairs of classes can be enhanced by manually separating the class clusters in the embedding. We evaluate our technique on a real-world scenario in fetal ultrasound imaging.

AIJan 20, 2020
ProjectionPathExplorer: Exploring Visual Patterns in Projected Decision-Making Paths

Andreas Hinterreiter, Christian Steinparz, Moritz Schöfl et al.

In problem-solving, a path towards solutions can be viewed as a sequence of decisions. The decisions, made by humans or computers, describe a trajectory through a high-dimensional representation space of the problem. By means of dimensionality reduction, these trajectories can be visualized in lower-dimensional space. Such embedded trajectories have previously been applied to a wide variety of data, but analysis has focused almost exclusively on the self-similarity of single trajectories. In contrast, we describe patterns emerging from drawing many trajectories -- for different initial conditions, end states, and solution strategies -- in the same embedding space. We argue that general statements about the problem-solving tasks and solving strategies can be made by interpreting these patterns. We explore and characterize such patterns in trajectories resulting from human and machine-made decisions in a variety of application domains: logic puzzles (Rubik's cube), strategy games (chess), and optimization problems (neural network training). We also discuss the importance of suitably chosen representation spaces and similarity metrics for the embedding.

LGOct 2, 2019
ConfusionFlow: A model-agnostic visualization for temporal analysis of classifier confusion

Andreas Hinterreiter, Peter Ruch, Holger Stitz et al.

Classifiers are among the most widely used supervised machine learning algorithms. Many classification models exist, and choosing the right one for a given task is difficult. During model selection and debugging, data scientists need to assess classifiers' performances, evaluate their learning behavior over time, and compare different models. Typically, this analysis is based on single-number performance measures such as accuracy. A more detailed evaluation of classifiers is possible by inspecting class errors. The confusion matrix is an established way for visualizing these class errors, but it was not designed with temporal or comparative analysis in mind. More generally, established performance analysis systems do not allow a combined temporal and comparative analysis of class-level information. To address this issue, we propose ConfusionFlow, an interactive, comparative visualization tool that combines the benefits of class confusion matrices with the visualization of performance characteristics over time. ConfusionFlow is model-agnostic and can be used to compare performances for different model types, model architectures, and/or training and test datasets. We demonstrate the usefulness of ConfusionFlow in a case study on instance selection strategies in active learning. We further assess the scalability of ConfusionFlow and present a use case in the context of neural network pruning.

HCApr 9, 2018
Juniper: A Tree+Table Approach to Multivariate Graph Visualization

Carolina Nobre, Marc Streit, Alexander Lex

Analyzing large, multivariate graphs is an important problem in many domains, yet such graphs are challenging to visualize. In this paper, we introduce a novel, scalable, tree+table multivariate graph visualization technique, which makes many tasks related to multivariate graph analysis easier to achieve. The core principle we follow is to selectively query for nodes or subgraphs of interest and visualize these subgraphs as a spanning tree of the graph. The tree is laid out linearly, which enables us to juxtapose the nodes with a table visualization where diverse attributes can be shown. We also use this table as an adjacency matrix, so that the resulting technique is a hybrid node-link/adjacency matrix technique. We implement this concept in Juniper and complement it with a set of interaction techniques that enable analysts to dynamically grow, restructure, and aggregate the tree, as well as change the layout or show paths between nodes. We demonstrate the utility of our tool in usage scenarios for different multivariate networks: a bipartite network of scholars, papers, and citation metrics and a multitype network of story characters, places, books, etc.

HCDec 16, 2017
Taggle: Combining Overview and Details in Tabular Data Visualizations

Katarina Furmanova, Samuel Gratzl, Holger Stitz et al.

Most tabular data visualization techniques focus on overviews, yet many practical analysis tasks are concerned with investigating individual items of interest. At the same time, relating an item to the rest of a potentially large table is important. In this work we present Taggle, a tabular visualization technique for exploring and presenting large and complex tables. Taggle takes an item-centric, spreadsheet-like approach, visualizing each row in the source data individually using visual encodings for the cells. At the same time, Taggle introduces data-driven aggregation of data subsets. The aggregation strategy is complemented by interaction methods tailored to answer specific analysis questions, such as sorting based on multiple columns and rich data selection and filtering capabilities. We demonstrate Taggle using a case study conducted by a domain expert on complex genomics data analysis for the purpose of drug discovery.

HCOct 18, 2017
Amending the Characterization of Guidance in Visual Analytics

Davide Ceneda, Theresia Gschwandtner, Thorsten May et al.

At VAST 2016, a characterization of guidance has been presented. It includes a definition of guidance and a model of guidance based on van Wijk's model of visualization. This note amends the original characterization of guidance in two aspects. First, we provide a clarification of what guidance actually is (and is not). Second, we insert into the model a conceptually relevant link that was missing in the original version.