Haichao Miao

AI
h-index49
11papers
95citations
Novelty39%
AI Score51

11 Papers

AIJun 4Code
SciVisAgentSkills: Design and Evaluation of Agent Skills for Scientific Data Analysis and Visualization

Kuangshi Ai, Haichao Miao, Kaiyuan Tang et al.

Recent advances in agentic visualization have enabled the translation of natural language into executable scientific visualization (SciVis) workflows. While general-purpose coding agents show strong capabilities, they often lack the tool-specific expertise required for SciVis tasks. In this work, we present SciVisAgentSkills, a collection of reusable agent skills that augment coding agents for scientific data analysis and visualization by encoding environment assumptions, tool usage patterns, and domain heuristics across scientific tools such as ParaView, napari, VMD, and TTK. We evaluate these skills on Codex and Claude Code using SciVisAgentBench, a benchmark of 108 expert-designed multi-step tasks. Results show that agent skills improve mean task scores across the evaluated suites, with token-efficiency benefits that depend on the agent harness and tool setting. These findings highlight the importance of structured procedural knowledge for enabling reliable, long-horizon SciVis workflows, while also showing that skills should be studied alongside the execution harness that loads and applies them. The skills are available at https://github.com/KuangshiAi/SciVisAgentSkills.

AIMar 31
SciVisAgentBench: A Benchmark for Evaluating Scientific Data Analysis and Visualization Agents

Kuangshi Ai, Haichao Miao, Kaiyuan Tang et al.

Recent advances in large language models (LLMs) have enabled agentic systems that translate natural language intent into executable scientific visualization (SciVis) tasks. Despite rapid progress, the community lacks a principled and reproducible benchmark for evaluating these emerging SciVis agents in realistic, multi-step analysis settings. We present SciVisAgentBench, a comprehensive and extensible benchmark for evaluating scientific data analysis and visualization agents. Our benchmark is grounded in a structured taxonomy spanning four dimensions: application domain, data type, complexity level, and visualization operation. It currently comprises 108 expert-crafted cases covering diverse SciVis scenarios. To enable reliable assessment, we introduce a multimodal outcome-centric evaluation pipeline that combines LLM-based judging with deterministic evaluators, including image-based metrics, code checkers, rule-based verifiers, and case-specific evaluators. We also conduct a validity study with 12 SciVis experts to examine the agreement between human and LLM judges. Using this framework, we evaluate representative SciVis agents and general-purpose coding agents to establish initial baselines and reveal capability gaps. SciVisAgentBench is designed as a living benchmark to support systematic comparison, diagnose failure modes, and drive progress in agentic SciVis. The benchmark is available at https://scivisagentbench.github.io/.

QMApr 18, 2022
SynopSet: Multiscale Visual Abstraction Set for Explanatory Analysis of DNA Nanotechnology Simulations

Deng Luo, Alexandre Kouyoumdjian, Ondřej Strnad et al.

We propose a new abstraction set (SynopSet) that has a continuum of visual representations for the explanatory analysis of molecular dynamics simulations (MDS) in the DNA nanotechnology domain. By re-purposing the commonly used progress bar and designing novel visuals, as well as transforming the data from the domain format to a format that better fits the newly designed visuals, we compose this new set of representations. This set is also designed to be capable of showing all spatial and temporal details, and all structural complexity, or abstracting these to various degrees, enabling both the slow playback of the simulation for detailed examinations or very fast playback for an overview that helps to efficiently identify events of interest, as well as several intermediate levels between these two extremes. For any pair of successive representations, we demonstrate smooth, continuous transitions, enabling users to keep track of relevant information from one representation to the next. By providing multiple representations suited to different temporal resolutions and connected by smooth transitions, we enable time-efficient simulation analysis, giving users the opportunity to examine and present important phases in great detail, or leverage abstract representations to go over uneventful phases much faster. Domain experts can thus gain actionable insight about their simulations and communicate it in a much shorter time. Further, the novel representations are more intuitive and also enable researchers unfamiliar with MDS analysis graphs to better understand the simulation results. We assessed the effectiveness of SynopSet on 12 DNA nanostructure simulations together with a domain expert. We have also shown that our set of representations can be systematically located in a visualization space, dubbed SynopSpace.

AIMay 20
Toward AI VIS Co-Scientists: A General and End-to-End Agent Harness for Solving Complex Data Visualization Tasks

Haichao Miao, Zhimin Li, Kuangshi Ai et al.

The ability to inspect, interpret, and communicate complex data is crucial for virtually any scientific endeavor, but often requires significant expertise outside the core domain ranging from data management and analysis to visualization design and implementation. We present an end-to-end agentic harness that, based on only the data and a high level description of the tasks, independently designs custom visual analysis applications (VIS apps). This represents an important step towards a general AI co-scientist envisioned by many as an autonomous system that can autonomously execute long horizon tasks based on high-level directions. Our proposed VIS co-scientist is an essential component of this broader AI co-scientist vision: a harness that can autonomously analyze data and design visualization solutions using a collection of agents and specialized skills that coordinate exploratory analysis, plan, configure the environment, implement, validate the interface, and most importantly evaluate the overall task completion. Each stage produces document and instruction artifacts that guide downstream work and enable iterative refinement. We validate this approach on IEEE SciVis Contests spanning multiple science and engineering fields. These contests serve as ideal proving grounds because they encode real-world complexity: ambiguous requirements, diverse data modalities, design trade-offs, and task-driven validation. Given only the data and target tasks, our system autonomously produces functional single-page VIS Apps with verified linked-view behavior, highly customized to domain experts' specified tasks and needs.

AIApr 30
Exploring Interaction Paradigms for LLM Agents in Scientific Visualization

Jackson Vonderhorst, Kuangshi Ai, Haichao Miao et al.

This paper examines how different types of large language model (LLM) agents perform on scientific visualization (SciVis) tasks, where users generate visualization workflows from natural-language instructions. We compare three primary interaction paradigms, including domain-specific agents with structured tool use, computer-use agents, and general-purpose coding agents, by evaluating eight representative agents across 15 benchmark tasks and measuring visualization quality, efficiency, robustness, and computational cost. We further analyze interaction modalities, including code scripts and model context protocol (MCP) or API calls for structured tool use, as well as command-line interfaces (CLI) and graphical user interfaces (GUI) for more general interaction, while additionally studying the effect of persistent memory in selected agents. The results reveal clear tradeoffs across paradigms and modalities. General-purpose coding agents achieve the highest task success rates but are computationally expensive, while domain-specific agents are more efficient and stable but less flexible. Computer-use agents perform well on individual steps but struggle with longer multi-step workflows, indicating that long-horizon planning is their primary limitation. Across both CLI- and GUI-based settings, persistent memory improves performance over repeated trials, although its benefits depend on the underlying interaction mode and the quality of feedback. These findings suggest that no single approach is sufficient, and future SciVis systems should combine structured tool use, interactive capabilities, and adaptive memory mechanisms to balance performance, robustness, and flexibility.

HCDec 7, 2023
AVA: Towards Autonomous Visualization Agents through Visual Perception-Driven Decision-Making

Shusen Liu, Haichao Miao, Zhimin Li et al.

With recent advances in multi-modal foundation models, the previously text-only large language models (LLM) have evolved to incorporate visual input, opening up unprecedented opportunities for various applications in visualization. Our work explores the utilization of the visual perception ability of multi-modal LLMs to develop Autonomous Visualization Agents (AVAs) that can interpret and accomplish user-defined visualization objectives through natural language. We propose the first framework for the design of AVAs and present several usage scenarios intended to demonstrate the general applicability of the proposed paradigm. The addition of visual perception allows AVAs to act as the virtual visualization assistant for domain experts who may lack the knowledge or expertise in fine-tuning visualization outputs. Our preliminary exploration and proof-of-concept agents suggest that this approach can be widely applicable whenever the choices of appropriate visualization parameters require the interpretation of previous visual output. Feedback from unstructured interviews with experts in AI research, medical visualization, and radiology has been incorporated, highlighting the practicality and potential of AVAs. Our study indicates that AVAs represent a general paradigm for designing intelligent visualization systems that can achieve high-level visualization goals, which pave the way for developing expert-level visualization agents in the future.

HCApr 14, 2025
See or Recall: A Sanity Check for the Role of Vision in Solving Visualization Question Answer Tasks with Multimodal LLMs

Zhimin Li, Haichao Miao, Xinyuan Yan et al.

Recent developments in multimodal large language models (MLLM) have equipped language models to reason about vision and language jointly. This permits MLLMs to both perceive and answer questions about data visualization across a variety of designs and tasks. Applying MLLMs to a broad range of visualization tasks requires us to properly evaluate their capabilities, and the most common way to conduct evaluation is through measuring a model's visualization reasoning capability, analogous to how we would evaluate human understanding of visualizations (e.g., visualization literacy). However, we found that in the context of visualization question answering (VisQA), how an MLLM perceives and reasons about visualizations can be fundamentally different from how humans approach the same problem. During the evaluation, even without visualization, the model could correctly answer a substantial portion of the visualization test questions, regardless of whether any selection options were provided. We hypothesize that the vast amount of knowledge encoded in the language model permits factual recall that supersedes the need to seek information from the visual signal. It raises concerns that the current VisQA evaluation may not fully capture the models' visualization reasoning capabilities. To address this, we propose a comprehensive sanity check framework that integrates a rule-based decision tree and a sanity check table to disentangle the effects of "seeing" (visual processing) and "recall" (reliance on prior knowledge). This validates VisQA datasets for evaluation, highlighting where models are truly "seeing", positively or negatively affected by the factual recall, or relying on inductive biases for question answering. Our study underscores the need for careful consideration in designing future visualization understanding studies when utilizing MLLMs.

HCMay 11, 2025
ParaView-MCP: An Autonomous Visualization Agent with Direct Tool Use

Shusen Liu, Haichao Miao, Peer-Timo Bremer

While powerful and well-established, tools like ParaView present a steep learning curve that discourages many potential users. This work introduces ParaView-MCP, an autonomous agent that integrates modern multimodal large language models (MLLMs) with ParaView to not only lower the barrier to entry but also augment ParaView with intelligent decision support. By leveraging the state-of-the-art reasoning, command execution, and vision capabilities of MLLMs, ParaView-MCP enables users to interact with ParaView through natural language and visual inputs. Specifically, our system adopted the Model Context Protocol (MCP) - a standardized interface for model-application communication - that facilitates direct interaction between MLLMs with ParaView's Python API to allow seamless information exchange between the user, the language model, and the visualization tool itself. Furthermore, by implementing a visual feedback mechanism that allows the agent to observe the viewport, we unlock a range of new capabilities, including recreating visualizations from examples, closed-loop visualization parameter updates based on user-defined goals, and even cross-application collaboration involving multiple tools. Broadly, we believe such an agent-driven visualization paradigm can profoundly change the way we interact with visualization tools. We expect a significant uptake in the development of such visualization tools, in both visualization research and industry.

HCSep 18, 2025
An Evaluation-Centric Paradigm for Scientific Visualization Agents

Kuangshi Ai, Haichao Miao, Zhimin Li et al.

Recent advances in multi-modal large language models (MLLMs) have enabled increasingly sophisticated autonomous visualization agents capable of translating user intentions into data visualizations. However, measuring progress and comparing different agents remains challenging, particularly in scientific visualization (SciVis), due to the absence of comprehensive, large-scale benchmarks for evaluating real-world capabilities. This position paper examines the various types of evaluation required for SciVis agents, outlines the associated challenges, provides a simple proof-of-concept evaluation example, and discusses how evaluation benchmarks can facilitate agent self-improvement. We advocate for a broader collaboration to develop a SciVis agentic evaluation benchmark that would not only assess existing capabilities but also drive innovation and stimulate future development in the field.

CLJun 24, 2024
Visualization Literacy of Multimodal Large Language Models: A Comparative Study

Zhimin Li, Haichao Miao, Valerio Pascucci et al.

The recent introduction of multimodal large language models (MLLMs) combine the inherent power of large language models (LLMs) with the renewed capabilities to reason about the multimodal context. The potential usage scenarios for MLLMs significantly outpace their text-only counterparts. Many recent works in visualization have demonstrated MLLMs' capability to understand and interpret visualization results and explain the content of the visualization to users in natural language. In the machine learning community, the general vision capabilities of MLLMs have been evaluated and tested through various visual understanding benchmarks. However, the ability of MLLMs to accomplish specific visualization tasks based on visual perception has not been properly explored and evaluated, particularly, from a visualization-centric perspective. In this work, we aim to fill the gap by utilizing the concept of visualization literacy to evaluate MLLMs. We assess MLLMs' performance over two popular visualization literacy evaluation datasets (VLAT and mini-VLAT). Under the framework of visualization literacy, we develop a general setup to compare different multimodal large language models (e.g., GPT4-o, Claude 3 Opus, Gemini 1.5 Pro) as well as against existing human baselines. Our study demonstrates MLLMs' competitive performance in visualization literacy, where they outperform humans in certain tasks such as identifying correlations, clusters, and hierarchical structures.

GRJul 29, 2019
ScaleTrotter: Illustrative Visual Travels Across Negative Scales

Sarkis Halladjian, Haichao Miao, David Kouřil et al.

We present ScaleTrotter, a conceptual framework for an interactive, multi-scale visualization of biological mesoscale data and, specifically, genome data. ScaleTrotter allows viewers to smoothly transition from the nucleus of a cell to the atomistic composition of the DNA, while bridging several orders of magnitude in scale. The challenges in creating an interactive visualization of genome data are fundamentally different in several ways from those in other domains like astronomy that require a multi-scale representation as well. First, genome data has intertwined scale levels---the DNA is an extremely long, connected molecule that manifests itself at all scale levels. Second, elements of the DNA do not disappear as one zooms out---instead the scale levels at which they are observed group these elements differently. Third, we have detailed information and thus geometry for the entire dataset and for all scale levels, posing a challenge for interactive visual exploration. Finally, the conceptual scale levels for genome data are close in scale space, requiring us to find ways to visually embed a smaller scale into a coarser one. We address these challenges by creating a new multi-scale visualization concept. We use a scale-dependent camera model that controls the visual embedding of the scales into their respective parents, the rendering of a subset of the scale hierarchy, and the location, size, and scope of the view. In traversing the scales, ScaleTrotter is roaming between 2D and 3D visual representations that are depicted in integrated visuals. We discuss, specifically, how this form of multi-scale visualization follows from the specific characteristics of the genome data and describe its implementation. Finally, we discuss the implications of our work to the general illustrative depiction of multi-scale data.