GNSep 19, 2025Code
GQVis: A Dataset of Genomics Data Questions and Visualizations for Generative AISkylar Sargent Walters, Arthea Valderrama, Thomas C. Smits et al.
Data visualization is a fundamental tool in genomics research, enabling the exploration, interpretation, and communication of complex genomic features. While machine learning models show promise for transforming data into insightful visualizations, current models lack the training foundation for domain-specific tasks. In an effort to provide a foundational resource for genomics-focused model training, we present a framework for generating a dataset that pairs abstract, low-level questions about genomics data with corresponding visualizations. Building on prior work with statistical plots, our approach adapts to the complexity of genomics data and the specialized representations used to depict them. We further incorporate multiple linked queries and visualizations, along with justifications for design choices, figure captions, and image alt-texts for each item in the dataset. We use genomics data retrieved from three distinct genomics data repositories (4DN, ENCODE, Chromoscope) to produce GQVis: a dataset consisting of 1.14 million single-query data points, 628k query pairs, and 589k query chains. The GQVis dataset and generation code are available at https://huggingface.co/datasets/HIDIVE/GQVis and https://github.com/hms-dbmi/GQVis-Generation.
HCMay 9
Sycamore: Characterizing Synthetic Personas for Evaluating Genomics Visualization RetrievalHuyen N. Nguyen, Astrid van den Brandt, Nils Gehlenborg
Evaluating visualization systems in niche domains such as genomics is challenging due to scarcity of domain experts and difficulty recruiting a representative user base. While LLM-based synthetic personas are increasingly used to ease evaluation bottlenecks, they face well-founded skepticism. Rather than weighing synthetic personas as substitutes for real users, we ask a fundamental open question: when synthetic personas evaluate a real visualization system, what do they actually produce, and how does that output change when grounded in documented human contexts? We present Sycamore, an exploratory three-condition probe design using Geranium, a search engine for multimodal genomics visualization, as a case study. Sycamore evaluates Geranium using: (1) ungrounded synthetic personas from generic LLM priors; (2) grounded synthetic personas constrained by voice-of-customer artifacts from a prior interview study; and (3) a published baseline study of real domain experts. We observe that grounding shifts synthetic feedback toward the language and concerns of documented users, while ungrounded evaluators drift toward operational specifics that real participants did not raise; both synthetic conditions, however, converge on a find-and-adapt frame and miss the image-modality preference observed in the expert study. We discuss what these observations imply for where synthetic personas might fit alongside expert studies in domain-specific visualization evaluation. All supplemental materials are available at https://osf.io/kdfr3/.
HCMar 6
Visualization Retrieval for Data Literacy: Position PaperHuyen N. Nguyen, Nils Gehlenborg
Current resources for data literacy education, such as visualization galleries and datasets, provide useful examples but lack mechanisms for learners to query, compare, and navigate the visualization design space efficiently. This position paper advocates for visualization retrieval as essential infrastructure for data literacy, transforming static collections into dynamic, inquiry-based learning environments. We analyze the role of retrieval across the data lifecycle, demonstrating how it facilitates design space exploration and vocabulary expansion, supports data consumption through visualization comparison and critique, and aids data management via resource curation. We outline key opportunities for future research and system design, including integrated retrieval-authoring environments, pedagogical relevance modeling, and collaborative educational corpora. Ultimately, we argue that visualization retrieval systems empower learners to articulate intent, bridge technical barriers, and proactively reason with data.
HCOct 18, 2025
Safire: Similarity Framework for Visualization RetrievalHuyen N. Nguyen, Nils Gehlenborg
Effective visualization retrieval necessitates a clear definition of similarity. Despite the growing body of work in specialized visualization retrieval systems, a systematic approach to understanding visualization similarity remains absent. We introduce the Similarity Framework for Visualization Retrieval (Safire), a conceptual model that frames visualization similarity along two dimensions: comparison criteria and representation modalities. Comparison criteria identify the aspects that make visualizations similar, which we divide into primary facets (data, visual encoding, interaction, style, metadata) and derived properties (data-centric and human-centric measures). Safire connects what to compare with how comparisons are executed through representation modalities. We categorize existing representation approaches into four groups based on their levels of information content and visualization determinism: raster image, vector image, specification, and natural language description, together guiding what is computable and comparable. We analyze several visualization retrieval systems using Safire to demonstrate its practical value in clarifying similarity considerations. Our findings reveal how particular criteria and modalities align across different use cases. Notably, the choice of representation modality is not only an implementation detail but also an important decision that shapes retrieval capabilities and limitations. Based on our analysis, we provide recommendations and discuss broader implications for multimodal learning, AI applications, and visualization reproducibility.
HCJul 22, 2021
VisMCA: A Visual Analytics System for Misclassification Correction and Analysis. VAST Challenge 2020, Mini-Challenge 2 Award: Honorable Mention for Detailed Analysis of Patterns of MisclassificationHuyen N. Nguyen, Jake Gonzalez, Jian Guo et al.
This paper presents VisMCA, an interactive visual analytics system that supports deepening understanding in ML results, augmenting users' capabilities in correcting misclassification, and providing an analysis of underlying patterns, in response to the VAST Challenge 2020 Mini-Challenge 2. VisMCA facilitates tracking provenance and provides a comprehensive view of object detection results, easing re-labeling, and producing reliable, corrected data for future training. Our solution implements multiple analytical views on visual analysis to offer a deep insight for underlying pattern discovery.
HCOct 4, 2020
Interface Design for HCI Classroom: From Learners' PerspectiveHuyen N. Nguyen, Vinh T. Nguyen, Tommy Dang
Having a good Human-Computer Interaction (HCI) design is challenging. Previous works have contributed significantly to fostering HCI, including design principle with report study from the instructor view. The questions of how and to what extent students perceive the design principles are still left open. To answer this question, this paper conducts a study of HCI adoption in the classroom. The studio-based learning method was adapted to teach 83 graduate and undergraduate students in 16 weeks long with four activities. A standalone presentation tool for instant online peer feedback during the presentation session was developed to help students justify and critique other's work. Our tool provides a sandbox, which supports multiple application types, including Web-applications, Object Detection, Web-based Virtual Reality (VR), and Augmented Reality (AR). After presenting one assignment and two projects, our results showed that students acquired a better understanding of the Golden Rules principle over time, which was demonstrated by the development of visual interface design. The Wordcloud reveals the primary focus was on the user interface and shed some light on students' interest in user experience. The inter-rater score indicates the agreement among students that they have the same level of understanding of the principles. The results show a high level of guideline compliance with HCI principles, in which we witnessed variations in visual cognitive styles. Regardless of diversity in visual preference, the students presented high consistency and a similar perspective on adopting HCI design principles. The results also elicited suggestions into the development of the HCI curriculum in the future.
IROct 20, 2019
EQSA: Earthquake Situational Analytics from Social MediaHuyen N. Nguyen, Tommy Dang
This paper introduces EQSA, an interactive exploratory tool for earthquake situational analytics using social media. EQSA is designed to support users to characterize the condition across the area around the earthquake zone, regarding related events, resources to be allocated, and responses from the community. On the general level, changes in the volume of messages from chosen categories are presented, assisting users in conveying a general idea of the condition. More in-depth analysis is provided with topic evolution, community visualization, and location representation. EQSA is developed with intuitive, interactive features and multiple linked views, visualizing social media data, and supporting users to gain a comprehensive insight into the situation. In this paper, we present the application of EQSA with the VAST Challenge 2019: Mini-Challenge 3 (MC3) dataset.