LGOct 10, 2022
ParaDime: A Framework for Parametric Dimensionality ReductionAndreas Hinterreiter, Christina Humer, Bernhard Kainz et al.
ParaDime is a framework for parametric dimensionality reduction (DR). In parametric DR, neural networks are trained to embed high-dimensional data items in a low-dimensional space while minimizing an objective function. ParaDime builds on the idea that the objective functions of several modern DR techniques result from transformed inter-item relationships. It provides a common interface for specifying these relations and transformations and for defining how they are used within the losses that govern the training process. Through this interface, ParaDime unifies parametric versions of DR techniques such as metric MDS, t-SNE, and UMAP. It allows users to fully customize all aspects of the DR process. We show how this ease of customization makes ParaDime suitable for experimenting with interesting techniques such as hybrid classification/embedding models and supervised DR. This way, ParaDime opens up new possibilities for visualizing high-dimensional data.
HCMay 15
GEMS -- Guided Evolutionary Molecule Design for Sustainable ChemicalsCoelina Robinson, Franziska Weissbach, Kjell Jorner et al.
Designing safe and sustainable chemicals is critical to combat chemical pollution in our environment. Machine learning (ML) methods have been developed to aid with de novo molecule design. However, data on the environmental impacts of chemical compounds are sparse, resulting in low-fidelity ML oracles and unreliable candidate proposals. Furthermore, generative ML models rely on numerical scoring functions that cannot fully capture the nuanced chemical intuition of expert scientists required for real-world molecular design. We present GEMS-an interactive visual analytics tool that enables domain experts to directly collaborate with a genetic algorithm for molecule design. Users can integrate their expert knowledge to guide the evolutionary process by modifying the scoring function and molecule population without programming knowledge or ML developer support. A usage scenario demonstrates the system's application in designing sustainable antioxidant alternatives. In an interview session with domain scientists, we collected feedback on the usefulness of GEMS.
AIMay 7
Visual Fingerprints for LLM Generation ComparisonAmal Alnouri, Andreas Hinterreiter, Christina Humer et al.
Large language model (LLM) outputs arise from complex interactions among prompts, system instructions, model parameters, and architecture. We refer to specific configurations of these factors as generation conditions, each of which can bias outputs in various ways. Understanding how different generation conditions shape model behaviors is essential for tasks such as prompt design and model evaluation, yet it remains challenging due to the stochastic and open-ended nature of text generation. We present an approach to visually compare LLM outputs across generation conditions by modeling responses as collections of linguistic choices, including content, expression, and structure. We extract these choices using natural language processing pipelines and represent their distributions across repeated samples. We then visualize these distributions as visual fingerprints, enabling direct, distribution-level comparison of condition-specific tendencies. Through four usage scenarios, we demonstrate how visual fingerprints reveal consistent patterns in LLM behavior that are difficult to observe through individual responses or aggregate metrics.
LGApr 23
GFlowState: Visualizing the Training of Generative Flow Networks Beyond the RewardFlorian Holeczek, Andreas Hinterreiter, Alex Hernandez-Garcia et al.
We present GFlowState, a visual analytics system designed to illuminate the training process of Generative Flow Networks (GFlowNets or GFNs). GFlowNets are a probabilistic framework for generating samples proportionally to a reward function. While GFlowNets have proved to be powerful tools in applications such as molecule and material discovery, their training dynamics remain difficult to interpret. Standard machine learning tools allow metric tracking but do not reveal how models explore the sample space, construct sample trajectories, or shift sampling probabilities during training. Our solution, GFlowState, allows users to analyze sampling trajectories, compare the sample space relative to reference datasets, and analyze the training dynamics. To this end, we introduce multiple views, including a chart of candidate rankings, a state projection, a node-link diagram of the trajectory network, and a transition heatmap. These visualizations enable GFlowNet developers and users to investigate sampling behavior and policy evolution, and to identify underexplored regions and sources of training failure. Case studies demonstrate how the system supports debugging and assessing the quality of GFlowNets across application domains. By making the structural dynamics of GFlowNets observable, our work enhances their interpretability and can accelerate GFlowNet development in practice.
LGOct 2, 2025
Catalyst GFlowNet for electrocatalyst design: A hydrogen evolution reaction case studyLena Podina, Christina Humer, Alexandre Duval et al.
Efficient and inexpensive energy storage is essential for accelerating the adoption of renewable energy and ensuring a stable supply, despite fluctuations in sources such as wind and solar. Electrocatalysts play a key role in hydrogen energy storage (HES), allowing the energy to be stored as hydrogen. However, the development of affordable and high-performance catalysts for this process remains a significant challenge. We introduce Catalyst GFlowNet, a generative model that leverages machine learning-based predictors of formation and adsorption energy to design crystal surfaces that act as efficient catalysts. We demonstrate the performance of the model through a proof-of-concept application to the hydrogen evolution reaction, a key reaction in HES, for which we successfully identified platinum as the most efficient known catalyst. In future work, we aim to extend this approach to the oxygen evolution reaction, where current optimal catalysts are expensive metal oxides, and open the search space to discover new materials. This generative modeling framework offers a promising pathway for accelerating the search for novel and efficient catalysts.
CVJun 25, 2024
EvolvED: Evolutionary Embeddings to Understand the Generation Process of Diffusion ModelsVidya Prasad, Hans van Gorp, Christina Humer et al.
Diffusion models, widely used in image generation, rely on iterative refinement to generate images from noise. Understanding this data evolution is important for model development and interpretability, yet challenging due to its high-dimensional, iterative nature. Prior works often focus on static or instance-level analyses, missing the iterative and holistic aspects of the generative path. While dimensionality reduction can visualize image evolution for few instances, it does preserve the iterative structure. To address these gaps, we introduce EvolvED, a method that presents a holistic view of the iterative generative process in diffusion models. EvolvED goes beyond instance exploration by leveraging predefined research questions to streamline generative space exploration. Tailored prompts aligned with these questions are used to extract intermediate images, preserving iterative context. Targeted feature extractors trace the evolution of key image attribute evolution, addressing the complexity of high-dimensional outputs. Central to EvolvED is a novel evolutionary embedding algorithm that encodes iterative steps while maintaining semantic relations. It enhances the visualization of data evolution by clustering semantically similar elements within each iteration with t-SNE, grouping elements by iteration, and aligning an instance's elements across iterations. We present rectilinear and radial layouts to represent iterations and support exploration. We apply EvolvED to diffusion models like GLIDE and Stable Diffusion, demonstrating its ability to provide valuable insights into the generative process.