Nils Gehlenborg

HC
h-index19
12papers
159citations
Novelty29%
AI Score50

12 Papers

HCMay 29
Agentic Authoring of Interactive Multiview Visualizations in Genomics

Astrid van den Brandt, Kiroong Choe, Sehi L'Yi et al.

Diverse genomics data, scientific questions, and analysis tasks typically demand highly specialized visualizations. Therefore, users often must customize or author new ones tailored to their data. Existing tools are usually either limited in customization or require substantial learning or programming, and even expressive tools assume visualization expertise many users lack. Agentic and large language model (LLM) approaches are increasingly applied to complex scientific tasks, including visualization. Natural-language conversational interfaces offer a promising path to democratizing the authoring of complex visualizations. In the context of genomics, these approaches face additional challenges: genomics visualizations typically integrate heterogeneous data types and are composed of multiple linked interactive views. These challenges motivate more structured LLM-based schemes. We first characterize where vanilla LLM generation succeeds and fails for genomics visualization, identifying eight quality dimensions. We then compare six schemes--direct generation, a fixed pipeline, and four agentic configurations varying in the number of specialist agents and the presence of a reviewer--across 159 cases spanning three levels of query ambiguity and specification complexity. All schemes use the Gosling visualization grammar as structured output. Agentic iteration substantially improves perceived quality over both baselines, while more complex agent architectures yield no additional benefit. We discuss implications for designing agentic systems for domain-specific visualization authoring. All supplemental materials are available at https://osf.io/uqe83.

HCMay 13
Pluot: Towards 'write once, run everywhere' visualization software

Mark S. Keller, Nils Gehlenborg

Tools used for implementing visualization software systems can generally be divided into camps such as static versus interactive and desktop versus web-based. We contribute Pluot, an architecture that bridges these divides, enabling a single software implementation of a visualization to be used regardless of the target level of interactivity or computing environment. With Pluot, a visualization developer implements a given visualization rendering function once, using the Rust programming language. Then, bindings to the Rust program can be generated to enable reproducible execution of the rendering function from other languages, such as Python or JavaScript. Pluot can render visualizations to bitmap or vector graphics format, bridging gaps between interactive performance and publication-quality figure creation. The software is available at https://pluot.dev.

GNSep 19, 2025Code
GQVis: A Dataset of Genomics Data Questions and Visualizations for Generative AI

Skylar Sargent Walters, Arthea Valderrama, Thomas C. Smits et al.

Data visualization is a fundamental tool in genomics research, enabling the exploration, interpretation, and communication of complex genomic features. While machine learning models show promise for transforming data into insightful visualizations, current models lack the training foundation for domain-specific tasks. In an effort to provide a foundational resource for genomics-focused model training, we present a framework for generating a dataset that pairs abstract, low-level questions about genomics data with corresponding visualizations. Building on prior work with statistical plots, our approach adapts to the complexity of genomics data and the specialized representations used to depict them. We further incorporate multiple linked queries and visualizations, along with justifications for design choices, figure captions, and image alt-texts for each item in the dataset. We use genomics data retrieved from three distinct genomics data repositories (4DN, ENCODE, Chromoscope) to produce GQVis: a dataset consisting of 1.14 million single-query data points, 628k query pairs, and 589k query chains. The GQVis dataset and generation code are available at https://huggingface.co/datasets/HIDIVE/GQVis and https://github.com/hms-dbmi/GQVis-Generation.

HCMay 9
Sycamore: Characterizing Synthetic Personas for Evaluating Genomics Visualization Retrieval

Huyen N. Nguyen, Astrid van den Brandt, Nils Gehlenborg

Evaluating visualization systems in niche domains such as genomics is challenging due to scarcity of domain experts and difficulty recruiting a representative user base. While LLM-based synthetic personas are increasingly used to ease evaluation bottlenecks, they face well-founded skepticism. Rather than weighing synthetic personas as substitutes for real users, we ask a fundamental open question: when synthetic personas evaluate a real visualization system, what do they actually produce, and how does that output change when grounded in documented human contexts? We present Sycamore, an exploratory three-condition probe design using Geranium, a search engine for multimodal genomics visualization, as a case study. Sycamore evaluates Geranium using: (1) ungrounded synthetic personas from generic LLM priors; (2) grounded synthetic personas constrained by voice-of-customer artifacts from a prior interview study; and (3) a published baseline study of real domain experts. We observe that grounding shifts synthetic feedback toward the language and concerns of documented users, while ungrounded evaluators drift toward operational specifics that real participants did not raise; both synthetic conditions, however, converge on a find-and-adapt frame and miss the image-modality preference observed in the expert study. We discuss what these observations imply for where synthetic personas might fit alongside expert studies in domain-specific visualization evaluation. All supplemental materials are available at https://osf.io/kdfr3/.

HCMar 6
Visualization Retrieval for Data Literacy: Position Paper

Huyen N. Nguyen, Nils Gehlenborg

Current resources for data literacy education, such as visualization galleries and datasets, provide useful examples but lack mechanisms for learners to query, compare, and navigate the visualization design space efficiently. This position paper advocates for visualization retrieval as essential infrastructure for data literacy, transforming static collections into dynamic, inquiry-based learning environments. We analyze the role of retrieval across the data lifecycle, demonstrating how it facilitates design space exploration and vocabulary expansion, supports data consumption through visualization comparison and critique, and aids data management via resource curation. We outline key opportunities for future research and system design, including integrated retrieval-authoring environments, pedagogical relevance modeling, and collaborative educational corpora. Ultimately, we argue that visualization retrieval systems empower learners to articulate intent, bridge technical barriers, and proactively reason with data.

HCOct 18, 2025
Safire: Similarity Framework for Visualization Retrieval

Huyen N. Nguyen, Nils Gehlenborg

Effective visualization retrieval necessitates a clear definition of similarity. Despite the growing body of work in specialized visualization retrieval systems, a systematic approach to understanding visualization similarity remains absent. We introduce the Similarity Framework for Visualization Retrieval (Safire), a conceptual model that frames visualization similarity along two dimensions: comparison criteria and representation modalities. Comparison criteria identify the aspects that make visualizations similar, which we divide into primary facets (data, visual encoding, interaction, style, metadata) and derived properties (data-centric and human-centric measures). Safire connects what to compare with how comparisons are executed through representation modalities. We categorize existing representation approaches into four groups based on their levels of information content and visualization determinism: raster image, vector image, specification, and natural language description, together guiding what is computable and comparable. We analyze several visualization retrieval systems using Safire to demonstrate its practical value in clarifying similarity considerations. Our findings reveal how particular criteria and modalities align across different use cases. Notably, the choice of representation modality is not only an implementation detail but also an important decision that shapes retrieval capabilities and limitations. Based on our analysis, we provide recommendations and discuss broader implications for multimodal learning, AI applications, and visualization reproducibility.

HCSep 23, 2025
YAC: Bridging Natural Language and Interactive Visual Exploration with Generative AI for Biomedical Data Discovery

Devin Lange, Shanghua Gao, Pengwei Sui et al.

Incorporating natural language input has the potential to improve the capabilities of biomedical data discovery interfaces. However, user interface elements and visualizations are still powerful tools for interacting with data, even in the new world of generative AI. In our prototype system, YAC, Yet Another Chatbot, we bridge the gap between natural language and interactive visualizations by generating structured declarative output with a multi-agent system and interpreting that output to render linked interactive visualizations and apply data filters. Furthermore, we include widgets, which allow users to adjust the values of that structured output through user interface elements. We reflect on the capabilities and design of this system with an analysis of its technical dimensions and illustrate the capabilities through four usage scenarios.

HCSep 19, 2025
A Generative AI System for Biomedical Data Discovery with Grammar-Based Visualizations

Devin Lange, Shanghua Gao, Pengwei Sui et al.

We explore the potential for combining generative AI with grammar-based visualizations for biomedical data discovery. In our prototype, we use a multi-agent system to generate visualization specifications and apply filters. These visualizations are linked together, resulting in an interactive dashboard that is progressively constructed. Our system leverages the strengths of natural language while maintaining the utility of traditional user interfaces. Furthermore, we utilize generated interactive widgets enabling user adjustment. Finally, we demonstrate the potential utility of this system for biomedical data discovery with a case study.

HCMay 1, 2020
A Generic Framework and Library for Exploration of Small Multiples through Interactive Piling

Fritz Lekschas, Xinyi Zhou, Wei Chen et al.

Small multiples are miniature representations of visual information used generically across many domains. Handling large numbers of small multiples imposes challenges on many analytic tasks like inspection, comparison, navigation, or annotation. To address these challenges, we developed a framework and implemented a library called Piling.js for designing interactive piling interfaces. Based on the piling metaphor, such interfaces afford flexible organization, exploration, and comparison of large numbers of small multiples by interactively aggregating visual objects into piles. Based on a systematic analysis of previous work, we present a structured design space to guide the design of visual piling interfaces. To enable designers to efficiently build their own visual piling interfaces, Piling.js provides a declarative interface to avoid having to write low-level code and implements common aspects of the design space. An accompanying GUI additionally supports the dynamic configuration of the piling interface. We demonstrate the expressiveness of Piling.js with examples from machine learning, immunofluorescence microscopy, genomics, and public health.

HCJun 18, 2019
Periphery Plots for Contextualizing Heterogeneous Time-Based Charts

Bryce Morrow, Trevor Manz, Arlene E. Chung et al.

Patterns in temporal data can often be found across different scales, such as days, weeks, and months, making effective visualization of time-based data challenging. Here we propose a new approach for providing focus and context in time-based charts to enable interpretation of patterns across time scales. Our approach employs a focus zone with a time and a second axis, that can either represent quantities or categories, as well as a set of adjacent periphery plots that can aggregate data along the time, value, or both dimensions. We present a framework for periphery plots and describe two use cases that demonstrate the utility of our approach.

GNMay 8, 2019
Tasks, Techniques, and Tools for Genomic Data Visualization

Sabrina Nusrat, Theresa Harbig, Nils Gehlenborg

Genomic data visualization is essential for interpretation and hypothesis generation as well as a valuable aid in communicating discoveries. Visual tools bridge the gap between algorithmic approaches and the cognitive skills of investigators. Addressing this need has become crucial in genomics, as biomedical research is increasingly data-driven and many studies lack well-defined hypotheses. A key challenge in data-driven research is to discover unexpected patterns and to formulate hypotheses in an unbiased manner in vast amounts of genomic and other associated data. Over the past two decades, this has driven the development of numerous data visualization techniques and tools for visualizing genomic data. Based on a comprehensive literature survey, we propose taxonomies for data, visualization, and tasks involved in genomic data visualization. Furthermore, we provide a comprehensive review of published genomic visualization tools in the context of the proposed taxonomies.

IRJul 3, 2018
Visual Pattern-Driven Exploration of Big Data

Michael Behrisch, Robert Krueger, Fritz Lekschas et al.

Pattern extraction algorithms are enabling insights into the ever-growing amount of today's datasets by translating reoccurring data properties into compact representations. Yet, a practical problem arises: With increasing data volumes and complexity also the number of patterns increases, leaving the analyst with a vast result space. Current algorithmic and especially visualization approaches often fail to answer central overview questions essential for a comprehensive understanding of pattern distributions and support, their quality, and relevance to the analysis task. To address these challenges, we contribute a visual analytics pipeline targeted on the pattern-driven exploration of result spaces in a semi-automatic fashion. Specifically, we combine image feature analysis and unsupervised learning to partition the pattern space into interpretable, coherent chunks, which should be given priority in a subsequent in-depth analysis. In our analysis scenarios, no ground-truth is given. Thus, we employ and evaluate novel quality metrics derived from the distance distributions of our image feature vectors and the derived cluster model to guide the feature selection process. We visualize our results interactively, allowing the user to drill down from overview to detail into the pattern space and demonstrate our techniques in a case study on biomedical genomic data.