Shusen Liu

LG
h-index49
36papers
252citations
Novelty46%
AI Score54

36 Papers

98.4AIJun 4Code
SciVisAgentSkills: Design and Evaluation of Agent Skills for Scientific Data Analysis and Visualization

Kuangshi Ai, Haichao Miao, Kaiyuan Tang et al.

Recent advances in agentic visualization have enabled the translation of natural language into executable scientific visualization (SciVis) workflows. While general-purpose coding agents show strong capabilities, they often lack the tool-specific expertise required for SciVis tasks. In this work, we present SciVisAgentSkills, a collection of reusable agent skills that augment coding agents for scientific data analysis and visualization by encoding environment assumptions, tool usage patterns, and domain heuristics across scientific tools such as ParaView, napari, VMD, and TTK. We evaluate these skills on Codex and Claude Code using SciVisAgentBench, a benchmark of 108 expert-designed multi-step tasks. Results show that agent skills improve mean task scores across the evaluated suites, with token-efficiency benefits that depend on the agent harness and tool setting. These findings highlight the importance of structured procedural knowledge for enabling reliable, long-horizon SciVis workflows, while also showing that skills should be studied alongside the execution harness that loads and applies them. The skills are available at https://github.com/KuangshiAi/SciVisAgentSkills.

LGMar 19, 2023
Cross-GAN Auditing: Unsupervised Identification of Attribute Level Similarities and Differences between Pretrained Generative Models

Matthew L. Olson, Shusen Liu, Rushil Anirudh et al.

Generative Adversarial Networks (GANs) are notoriously difficult to train especially for complex distributions and with limited data. This has driven the need for tools to audit trained networks in human intelligible format, for example, to identify biases or ensure fairness. Existing GAN audit tools are restricted to coarse-grained, model-data comparisons based on summary statistics such as FID or recall. In this paper, we propose an alternative approach that compares a newly developed GAN against a prior baseline. To this end, we introduce Cross-GAN Auditing (xGA) that, given an established "reference" GAN and a newly proposed "client" GAN, jointly identifies intelligible attributes that are either common across both GANs, novel to the client GAN, or missing from the client GAN. This provides both users and model developers an intuitive assessment of similarity and differences between GANs. We introduce novel metrics to evaluate attribute-based GAN auditing approaches and use these metrics to demonstrate quantitatively that xGA outperforms baseline approaches. We also include qualitative results that illustrate the common, novel and missing attributes identified by xGA from GANs trained on a variety of image datasets.

HCJun 16, 2022
"Understanding Robustness Lottery": A Geometric Visual Comparative Analysis of Neural Network Pruning Approaches

Zhimin Li, Shusen Liu, Xin Yu et al.

Deep learning approaches have provided state-of-the-art performance in many applications by relying on large and overparameterized neural networks. However, such networks have been shown to be very brittle and are difficult to deploy on resource-limited platforms. Model pruning, i.e., reducing the size of the network, is a widely adopted strategy that can lead to a more robust and compact model. Many heuristics exist for model pruning, but empirical studies show that some heuristics improve performance whereas others can make models more brittle or have other side effects. This work aims to shed light on how different pruning methods alter the network's internal feature representation and the corresponding impact on model performance. To facilitate a comprehensive comparison and characterization of the high-dimensional model feature space, we introduce a visual geometric analysis of feature representations. We decomposed and evaluated a set of critical geometric concepts from the common adopted classification loss, and used them to design a visualization system to compare and highlight the impact of pruning on model performance and feature representation. The proposed tool provides an environment for in-depth comparison of pruning methods and a comprehensive understanding of how model response to common data corruption. By leveraging the proposed visualization, machine learning researchers can reveal the similarities between pruning methods and redundant in robustness evaluation benchmarks, obtain geometric insights about the differences between pruned models that achieve superior robustness performance, and identify samples that are robust or fragile to model pruning and common data corruption to model pruning and data corruption but also obtain insights and explanations on how some pruned models achieve superior robustness performance.

90.0AIMar 31
SciVisAgentBench: A Benchmark for Evaluating Scientific Data Analysis and Visualization Agents

Kuangshi Ai, Haichao Miao, Kaiyuan Tang et al.

Recent advances in large language models (LLMs) have enabled agentic systems that translate natural language intent into executable scientific visualization (SciVis) tasks. Despite rapid progress, the community lacks a principled and reproducible benchmark for evaluating these emerging SciVis agents in realistic, multi-step analysis settings. We present SciVisAgentBench, a comprehensive and extensible benchmark for evaluating scientific data analysis and visualization agents. Our benchmark is grounded in a structured taxonomy spanning four dimensions: application domain, data type, complexity level, and visualization operation. It currently comprises 108 expert-crafted cases covering diverse SciVis scenarios. To enable reliable assessment, we introduce a multimodal outcome-centric evaluation pipeline that combines LLM-based judging with deterministic evaluators, including image-based metrics, code checkers, rule-based verifiers, and case-specific evaluators. We also conduct a validity study with 12 SciVis experts to examine the agreement between human and LLM judges. Using this framework, we evaluate representative SciVis agents and general-purpose coding agents to establish initial baselines and reveal capability gaps. SciVisAgentBench is designed as a living benchmark to support systematic comparison, diagnose failure modes, and drive progress in agentic SciVis. The benchmark is available at https://scivisagentbench.github.io/.

LGOct 25, 2023
Instance-wise Linearization of Neural Network for Model Interpretation

Zhimin Li, Shusen Liu, Kailkhura Bhavya et al.

Neural network have achieved remarkable successes in many scientific fields. However, the interpretability of the neural network model is still a major bottlenecks to deploy such technique into our daily life. The challenge can dive into the non-linear behavior of the neural network, which rises a critical question that how a model use input feature to make a decision. The classical approach to address this challenge is feature attribution, which assigns an important score to each input feature and reveal its importance of current prediction. However, current feature attribution approaches often indicate the importance of each input feature without detail of how they are actually processed by a model internally. These attribution approaches often raise a concern that whether they highlight correct features for a model prediction. For a neural network model, the non-linear behavior is often caused by non-linear activation units of a model. However, the computation behavior of a prediction from a neural network model is locally linear, because one prediction has only one activation pattern. Base on the observation, we propose an instance-wise linearization approach to reformulates the forward computation process of a neural network prediction. This approach reformulates different layers of convolution neural networks into linear matrix multiplication. Aggregating all layers' computation, a prediction complex convolution neural network operations can be described as a linear matrix multiplication $F(x) = W \cdot x + b$. This equation can not only provides a feature attribution map that highlights the important of the input features but also tells how each input feature contributes to a prediction exactly. Furthermore, we discuss the application of this technique in both supervise classification and unsupervised neural network learning parametric t-SNE dimension reduction.

40.8COMP-PHMar 31
Learning noisy phase transition dynamics from stochastic partial differential equations

Luning Sun, Van Hai Nguyen, Shusen Liu et al.

The non-equilibrium dynamics of mesoscale phase transitions are fundamentally shaped by thermal fluctuations, which not only seed instabilities but actively control kinetic pathways, including rare barrier-crossing events such as nucleation that are entirely inaccessible to deterministic models. Machine-learning surrogates for such systems must therefore represent stochasticity explicitly, enforce conservation laws by construction, and expose physically interpretable structure. We develop physics-aware surrogate models for the stochastic Cahn-Hilliard equation in 3D that satisfy all three requirements simultaneously. The key innovation is to parameterize the surrogate at the level of inter-cell fluxes, decomposing each flux into a deterministic mobility-weighted chemical-potential gradient and a learnable noise amplitude. This design guarantees exact mass conservation at every step and adds physical fluctuations to inter-cell mass transport. A learnable free energy functional provides thermodynamic interpretability, validated by independent recovery of the bulk double-well landscape, interfacial excess energy, and curvature-independent interfacial tension. Tests demonstrate accurate reproduction of ensemble statistics and noise-accelerated coarsening, with generalization to spatial domains 64 times larger in volume and temporal horizons 160x longer than those seen during training. Critically, the stochastic surrogate captures thermally activated nucleation in the metastable regime, a qualitative capability that no deterministic surrogate can provide regardless of training, thus establishing flux-level stochasticity as an architectural necessity rather than an optional enhancement.

CVJun 30, 2023
Topological Data Analysis Guided Segment Anything Model Prompt Optimization for Zero-Shot Segmentation in Biological Imaging

Ruben Glatt, Shusen Liu

Emerging foundation models in machine learning are models trained on vast amounts of data that have been shown to generalize well to new tasks. Often these models can be prompted with multi-modal inputs that range from natural language descriptions over images to point clouds. In this paper, we propose topological data analysis (TDA) guided prompt optimization for the Segment Anything Model (SAM) and show preliminary results in the biological image segmentation domain. Our approach replaces the standard grid search approach that is used in the original implementation and finds point locations based on their topological significance. Our results show that the TDA optimized point cloud is much better suited for finding small objects and massively reduces computational complexity despite the extra step in scenarios which require many segmentations.

47.0DCMay 13
HPC-vQPU: A Service-Export Architecture for Virtual QPUs on Batch-Scheduled HPC Systems

Shusen Liu, Pascal Jahan Elahi, Ugo Varetto

Device-aware quantum simulation increasingly requires HPC-scale accelerators, yet secure supercomputers expose batch-scheduled execution environments rather than the interactive, backend-oriented interfaces expected by quantum software. The key obstacle is not only remote job submission: an HPC-hosted virtual QPU must preserve topology, native-gate, and calibration semantics across queue delay, scheduler allocation, compute-node isolation, and partial execution-side failures, without opening inbound paths into the cluster. We present HPC-vQPU, a service-export architecture for virtual QPUs on batch-scheduled HPC systems. HPC-vQPU separates a cloud-facing control plane, which owns device identity, task lifecycle, snapshot binding, and event projection, from an HPC-resident execution plane, which claims work and realises it through scheduler-backed GPU jobs. Coordination is exclusively outbound and agent initiated. The central abstraction is a topology- and calibration-aware device snapshot bound atomically at claim time and carried into execution as an immutable contract, making each scheduled job hermetic while preserving fresh device semantics. We implement HPC-vQPU at the Pawsey Supercomputing Research Centre using Setonix GPUs, Qiskit-Aer/cuQuantum, and IBM Fez calibration data. Production experiments show that service overhead is bounded and additive, while workload scaling remains confined to the simulator; calibration-bearing snapshots produce measurable output shifts; claim-time binding prevents stale execution after pre-claim device mutation; concurrent agents complete 50/50 tasks exactly once; and explicit recovery restores stale running tasks after agent failure. These results show that secure, scheduler-mediated HPC infrastructure can export device-faithful quantum simulation as an interactive virtual-QPU service.

CVOct 30, 2022
On-the-fly Object Detection using StyleGAN with CLIP Guidance

Yuzhe Lu, Shusen Liu, Jayaraman J. Thiagarajan et al.

We present a fully automated framework for building object detectors on satellite imagery without requiring any human annotation or intervention. We achieve this by leveraging the combined power of modern generative models (e.g., StyleGAN) and recent advances in multi-modal learning (e.g., CLIP). While deep generative models effectively encode the key semantics pertinent to a data distribution, this information is not immediately accessible for downstream tasks, such as object detection. In this work, we exploit CLIP's ability to associate image features with text descriptions to identify neurons in the generator network, which are subsequently used to build detectors on-the-fly.

LGDec 11, 2025
Interpretable and Steerable Concept Bottleneck Sparse Autoencoders

Akshay Kulkarni, Tsui-Wei Weng, Vivek Narayanaswamy et al.

Sparse autoencoders (SAEs) promise a unified approach for mechanistic interpretability, concept discovery, and model steering in LLMs and LVLMs. However, realizing this potential requires that the learned features be both interpretable and steerable. To that end, we introduce two new computationally inexpensive interpretability and steerability metrics and conduct a systematic analysis on LVLMs. Our analysis uncovers two observations; (i) a majority of SAE neurons exhibit either low interpretability or low steerability or both, rendering them ineffective for downstream use; and (ii) due to the unsupervised nature of SAEs, user-desired concepts are often absent in the learned dictionary, thus limiting their practical utility. To address these limitations, we propose Concept Bottleneck Sparse Autoencoders (CB-SAE) - a novel post-hoc framework that prunes low-utility neurons and augments the latent space with a lightweight concept bottleneck aligned to a user-defined concept set. The resulting CB-SAE improves interpretability by +32.1% and steerability by +14.5% across LVLMs and image generation tasks. We will make our code and model weights available.

31.9CVApr 28
LatentDiff: Scaling Semantic Dataset Comparison to Millions of Images

James Flora, Kowshik Thopalli, Akshay R. Kulkarni et al.

We present LatentDiff, a scalable framework for semantic dataset comparison that operates directly in the latent space of pretrained vision encoders. By combining sparse autoencoder-based divergence testing with density ratio estimation, LatentDiff identifies interpretable semantic differences between datasets at a fraction of the computational cost of caption-based alternatives. We also introduce Noisy-Diff, a benchmark capturing realistic sparse distribution shifts that cause existing methods to struggle. Experiments demonstrate that LatentDiff achieves superior accuracy while remaining robust to settings where an extremely small fraction of images (from 5% to <1% ) differ semantically.

75.9AIMay 20
Toward AI VIS Co-Scientists: A General and End-to-End Agent Harness for Solving Complex Data Visualization Tasks

Haichao Miao, Zhimin Li, Kuangshi Ai et al.

The ability to inspect, interpret, and communicate complex data is crucial for virtually any scientific endeavor, but often requires significant expertise outside the core domain ranging from data management and analysis to visualization design and implementation. We present an end-to-end agentic harness that, based on only the data and a high level description of the tasks, independently designs custom visual analysis applications (VIS apps). This represents an important step towards a general AI co-scientist envisioned by many as an autonomous system that can autonomously execute long horizon tasks based on high-level directions. Our proposed VIS co-scientist is an essential component of this broader AI co-scientist vision: a harness that can autonomously analyze data and design visualization solutions using a collection of agents and specialized skills that coordinate exploratory analysis, plan, configure the environment, implement, validate the interface, and most importantly evaluate the overall task completion. Each stage produces document and instruction artifacts that guide downstream work and enable iterative refinement. We validate this approach on IEEE SciVis Contests spanning multiple science and engineering fields. These contests serve as ideal proving grounds because they encode real-world complexity: ambiguous requirements, diverse data modalities, design trade-offs, and task-driven validation. Given only the data and target tasks, our system autonomously produces functional single-page VIS Apps with verified linked-view behavior, highly customized to domain experts' specified tasks and needs.

58.3CEMar 28
Sparse Autoencoders as a Steering Basis for Phase Synchronization in Graph-Based CFD Surrogates

Yeping Hu, Ruben Glatt, Shusen Liu

Graph-based surrogate models provide fast alternatives to high-fidelity CFD solvers, but their opaque latent spaces and limited controllability restrict use in safety-critical settings. A key failure mode in oscillatory flows is phase drift, where predictions remain qualitatively correct but gradually lose temporal alignment with observations, limiting use in digital twins and closed-loop control. Correcting this through retraining is expensive and impractical during deployment. We ask whether phase drift can instead be corrected post hoc by manipulating the latent space of a frozen surrogate. We propose a phase-steering framework for pretrained graph-based CFD models that combines the right representation with the right intervention mechanism. To obtain disentangled representation for effective steering, we use sparse autoencoders (SAEs) on frozen MeshGraphNet embeddings. To steer dynamics, we move beyond static per-feature interventions such as scaling or clamping, and introduce a temporally coherent, phase-aware method. Specifically, we identify oscillatory feature pairs with Hilbert analysis, project spatial fields into low-rank temporal coefficients via SVD, and apply smooth time-varying rotations to advance or delay periodic modes while preserving amplitude-phase structure. Using a representation-agnostic setup, we compare SAE-based steering with PCA and raw embedding spaces under the same intervention pipeline. Results show that sparse, disentangled representations outperform dense or entangled ones, while static interventions fail in this dynamical setting. Overall, this work shows that latent-space steering can be extended from semantic domains to time-dependent physical systems when interventions respect the underlying dynamics, and that the same sparse features used for interpretability can also serve as physically meaningful control axes.

42.8HCMar 26
TopoPilot: Reliable Conversational Workflow Automation for Topological Data Analysis and Visualization

Nathaniel Gorski, Shusen Liu, Bei Wang

Recent agentic systems demonstrate that large language models can generate scientific visualizations from natural language. However, reliability remains a major limitation: systems may execute invalid operations, introduce subtle but consequential errors, or fail to request missing information when inputs are underspecified. These issues are amplified in real-world workflows, which often exceed the complexity of standard benchmarks. Ensuring reliability in autonomous visualization pipelines therefore remains an open challenge. We present TopoPilot, a reliable and extensible agentic framework for automating complex scientific visualization workflows. TopoPilot incorporates systematic guardrails and verification mechanisms to ensure reliable operation. While we focus on topological data analysis and visualization as a primary use case, the framework is designed to generalize across visualization domains. TopoPilot adopts a reliability-centered two-agent architecture. An orchestrator agent translates user prompts into workflows composed of atomic backend actions, while a verifier agent evaluates these workflows prior to execution, enforcing structural validity and semantic consistency. This separation of interpretation and verification reduces code-generation errors and enforces correctness guarantees. A modular architecture further improves robustness by isolating components and enabling seamless integration of new descriptors and domain-specific workflows without modifying the core system. To systematically address reliability, we introduce a taxonomy of failure modes and implement targeted safeguards for each class. In evaluations simulating 1,000 multi-turn conversations across 100 prompts, including adversarial and infeasible requests, TopoPilot achieves a success rate exceeding 99%, compared to under 50% for baselines without comprehensive guardrails and checks.

CLNov 8, 2025
Visual Exploration of Feature Relationships in Sparse Autoencoders with Curated Concepts

Xinyuan Yan, Shusen Liu, Kowshik Thopalli et al.

Sparse autoencoders (SAEs) have emerged as a powerful tool for uncovering interpretable features in large language models (LLMs) through the sparse directions they learn. However, the sheer number of extracted directions makes comprehensive exploration intractable. While conventional embedding techniques such as UMAP can reveal global structure, they suffer from limitations including high-dimensional compression artifacts, overplotting, and misleading neighborhood distortions. In this work, we propose a focused exploration framework that prioritizes curated concepts and their corresponding SAE features over attempts to visualize all available features simultaneously. We present an interactive visualization system that combines topology-based visual encoding with dimensionality reduction to faithfully represent both local and global relationships among selected features. This hybrid approach enables users to investigate SAE behavior through targeted, interpretable subsets, facilitating deeper and more nuanced analysis of concept representation in latent space.

77.5AIApr 30
Exploring Interaction Paradigms for LLM Agents in Scientific Visualization

Jackson Vonderhorst, Kuangshi Ai, Haichao Miao et al.

This paper examines how different types of large language model (LLM) agents perform on scientific visualization (SciVis) tasks, where users generate visualization workflows from natural-language instructions. We compare three primary interaction paradigms, including domain-specific agents with structured tool use, computer-use agents, and general-purpose coding agents, by evaluating eight representative agents across 15 benchmark tasks and measuring visualization quality, efficiency, robustness, and computational cost. We further analyze interaction modalities, including code scripts and model context protocol (MCP) or API calls for structured tool use, as well as command-line interfaces (CLI) and graphical user interfaces (GUI) for more general interaction, while additionally studying the effect of persistent memory in selected agents. The results reveal clear tradeoffs across paradigms and modalities. General-purpose coding agents achieve the highest task success rates but are computationally expensive, while domain-specific agents are more efficient and stable but less flexible. Computer-use agents perform well on individual steps but struggle with longer multi-step workflows, indicating that long-horizon planning is their primary limitation. Across both CLI- and GUI-based settings, persistent memory improves performance over repeated trials, although its benefits depend on the underlying interaction mode and the quality of feedback. These findings suggest that no single approach is sufficient, and future SciVis systems should combine structured tool use, interactive capabilities, and adaptive memory mechanisms to balance performance, robustness, and flexibility.

HCDec 7, 2023
AVA: Towards Autonomous Visualization Agents through Visual Perception-Driven Decision-Making

Shusen Liu, Haichao Miao, Zhimin Li et al.

With recent advances in multi-modal foundation models, the previously text-only large language models (LLM) have evolved to incorporate visual input, opening up unprecedented opportunities for various applications in visualization. Our work explores the utilization of the visual perception ability of multi-modal LLMs to develop Autonomous Visualization Agents (AVAs) that can interpret and accomplish user-defined visualization objectives through natural language. We propose the first framework for the design of AVAs and present several usage scenarios intended to demonstrate the general applicability of the proposed paradigm. The addition of visual perception allows AVAs to act as the virtual visualization assistant for domain experts who may lack the knowledge or expertise in fine-tuning visualization outputs. Our preliminary exploration and proof-of-concept agents suggest that this approach can be widely applicable whenever the choices of appropriate visualization parameters require the interpretation of previous visual output. Feedback from unstructured interviews with experts in AI research, medical visualization, and radiology has been incorporated, highlighting the practicality and potential of AVAs. Our study indicates that AVAs represent a general paradigm for designing intelligent visualization systems that can achieve high-level visualization goals, which pave the way for developing expert-level visualization agents in the future.

HCApr 14, 2025
See or Recall: A Sanity Check for the Role of Vision in Solving Visualization Question Answer Tasks with Multimodal LLMs

Zhimin Li, Haichao Miao, Xinyuan Yan et al.

Recent developments in multimodal large language models (MLLM) have equipped language models to reason about vision and language jointly. This permits MLLMs to both perceive and answer questions about data visualization across a variety of designs and tasks. Applying MLLMs to a broad range of visualization tasks requires us to properly evaluate their capabilities, and the most common way to conduct evaluation is through measuring a model's visualization reasoning capability, analogous to how we would evaluate human understanding of visualizations (e.g., visualization literacy). However, we found that in the context of visualization question answering (VisQA), how an MLLM perceives and reasons about visualizations can be fundamentally different from how humans approach the same problem. During the evaluation, even without visualization, the model could correctly answer a substantial portion of the visualization test questions, regardless of whether any selection options were provided. We hypothesize that the vast amount of knowledge encoded in the language model permits factual recall that supersedes the need to seek information from the visual signal. It raises concerns that the current VisQA evaluation may not fully capture the models' visualization reasoning capabilities. To address this, we propose a comprehensive sanity check framework that integrates a rule-based decision tree and a sanity check table to disentangle the effects of "seeing" (visual processing) and "recall" (reliance on prior knowledge). This validates VisQA datasets for evaluation, highlighting where models are truly "seeing", positively or negatively affected by the factual recall, or relying on inductive biases for question answering. Our study underscores the need for careful consideration in designing future visualization understanding studies when utilizing MLLMs.

HCMay 11, 2025
ParaView-MCP: An Autonomous Visualization Agent with Direct Tool Use

Shusen Liu, Haichao Miao, Peer-Timo Bremer

While powerful and well-established, tools like ParaView present a steep learning curve that discourages many potential users. This work introduces ParaView-MCP, an autonomous agent that integrates modern multimodal large language models (MLLMs) with ParaView to not only lower the barrier to entry but also augment ParaView with intelligent decision support. By leveraging the state-of-the-art reasoning, command execution, and vision capabilities of MLLMs, ParaView-MCP enables users to interact with ParaView through natural language and visual inputs. Specifically, our system adopted the Model Context Protocol (MCP) - a standardized interface for model-application communication - that facilitates direct interaction between MLLMs with ParaView's Python API to allow seamless information exchange between the user, the language model, and the visualization tool itself. Furthermore, by implementing a visual feedback mechanism that allows the agent to observe the viewport, we unlock a range of new capabilities, including recreating visualizations from examples, closed-loop visualization parameter updates based on user-defined goals, and even cross-application collaboration involving multiple tools. Broadly, we believe such an agent-driven visualization paradigm can profoundly change the way we interact with visualization tools. We expect a significant uptake in the development of such visualization tools, in both visualization research and industry.

LGDec 6, 2023
Transformer-Powered Surrogates Close the ICF Simulation-Experiment Gap with Extremely Limited Data

Matthew L. Olson, Shusen Liu, Jayaraman J. Thiagarajan et al.

Recent advances in machine learning, specifically transformer architecture, have led to significant advancements in commercial domains. These powerful models have demonstrated superior capability to learn complex relationships and often generalize better to new data and problems. This paper presents a novel transformer-powered approach for enhancing prediction accuracy in multi-modal output scenarios, where sparse experimental data is supplemented with simulation data. The proposed approach integrates transformer-based architecture with a novel graph-based hyper-parameter optimization technique. The resulting system not only effectively reduces simulation bias, but also achieves superior prediction accuracy compared to the prior method. We demonstrate the efficacy of our approach on inertial confinement fusion experiments, where only 10 shots of real-world data are available, as well as synthetic versions of these experiments.

46.9QUANT-PHMar 31
DynQ: A Dynamic Topology-Agnostic Quantum Virtual Machine via Quality-Weighted Community Detection

Shusen Liu, Pascal Jahan Elahi, Ugo Varetto

Quantum cloud platforms have scaled hardware capacity but not the abstraction exposed to users: small programs still monopolise entire processors, and existing Quantum Virtual Machine (QVM) designs often rely on fixed, topology-specific partitions that are brittle under calibration drift, spatial heterogeneity, and transient defects. We present DynQ, a dynamic topology-agnostic QVM that derives execution regions directly from live calibration data. DynQ models a processor as a quality-weighted coupling graph and formulates region discovery as community detection, turning high internal cohesion and low external coupling into a hardware-aware objective for quantum virtualisation. This produces regions that are compilation-friendly, quality-aware, and resilient to degraded couplers and unavailable qubits. DynQ separates offline region discovery from online allocation, enabling low-latency scheduling over pre-validated regions while allowing recomputation under changing hardware conditions. Across five IBM backends, real-device experiments on IBM Kingston and Torino, and cross-architecture evaluation on Rigetti Ankaa-3 via AWS Braket, DynQ improves execution quality, recovers workloads lost under transient defects, and maintains stable output under concurrent batching. It reduces L1 error by up to 45.1% and improves output similarity by up to 19.1% on heterogeneous hardware, while eliminating observed baseline failures on real devices. These results position quantum virtualisation as a graph-driven systems problem and show that adaptive, quality-aware QVMs enable reliable multi-tenant quantum cloud services.

LGNov 20, 2025
Self-supervised and Multi-fidelity Learning for Extended Predictive Soil Spectroscopy

Luning Sun, José L. Safanelli, Jonathan Sanderman et al.

We propose a self-supervised machine learning (SSML) framework for multi-fidelity learning and extended predictive soil spectroscopy based on latent space embeddings. A self-supervised representation was pretrained with the large MIR spectral library and the Variational Autoencoder algorithm to obtain a compressed latent space for generating spectral embeddings. At this stage, only unlabeled spectral data were used, allowing us to leverage the full spectral database and the availability of scan repeats for augmented training. We also leveraged and froze the trained MIR decoder for a spectrum conversion task by plugging it into a NIR encoder to learn the mapping between NIR and MIR spectra in an attempt to leverage the predictive capabilities contained in the large MIR library with a low cost portable NIR scanner. This was achieved by using a smaller subset of the KSSL library with paired NIR and MIR spectra. Downstream machine learning models were then trained to map between original spectra, predicted spectra, and latent space embeddings for nine soil properties. The performance of was evaluated independently of the KSSL training data using a gold-standard test set, along with regression goodness-of-fit metrics. Compared to baseline models, the proposed SSML and its embeddings yielded similar or better accuracy in all soil properties prediction tasks. Predictions derived from the spectrum conversion (NIR to MIR) task did not match the performance of the original MIR spectra but were similar or superior to predictive performance of NIR-only models, suggesting the unified spectral latent space can effectively leverage the larger and more diverse MIR dataset for prediction of soil properties not well represented in current NIR libraries.

HCSep 18, 2025
An Evaluation-Centric Paradigm for Scientific Visualization Agents

Kuangshi Ai, Haichao Miao, Zhimin Li et al.

Recent advances in multi-modal large language models (MLLMs) have enabled increasingly sophisticated autonomous visualization agents capable of translating user intentions into data visualizations. However, measuring progress and comparing different agents remains challenging, particularly in scientific visualization (SciVis), due to the absence of comprehensive, large-scale benchmarks for evaluating real-world capabilities. This position paper examines the various types of evaluation required for SciVis agents, outlines the associated challenges, provides a simple proof-of-concept evaluation example, and discusses how evaluation benchmarks can facilitate agent self-improvement. We advocate for a broader collaboration to develop a SciVis agentic evaluation benchmark that would not only assess existing capabilities but also drive innovation and stimulate future development in the field.

IRJul 23, 2025
VERIRAG: Healthcare Claim Verification via Statistical Audit in Retrieval-Augmented Generation

Shubham Mohole, Hongjun Choi, Shusen Liu et al.

Retrieval-augmented generation (RAG) systems are increasingly adopted in clinical decision support, yet they remain methodologically blind-they retrieve evidence but cannot vet its scientific quality. A paper claiming "Antioxidant proteins decreased after alloferon treatment" and a rigorous multi-laboratory replication study will be treated as equally credible, even if the former lacked scientific rigor or was even retracted. To address this challenge, we introduce VERIRAG, a framework that makes three notable contributions: (i) the Veritable, an 11-point checklist that evaluates each source for methodological rigor, including data integrity and statistical validity; (ii) a Hard-to-Vary (HV) Score, a quantitative aggregator that weights evidence by its quality and diversity; and (iii) a Dynamic Acceptance Threshold, which calibrates the required evidence based on how extraordinary a claim is. Across four datasets-comprising retracted, conflicting, comprehensive, and settled science corpora-the VERIRAG approach consistently outperforms all baselines, achieving absolute F1 scores ranging from 0.53 to 0.65, representing a 10 to 14 point improvement over the next-best method in each respective dataset. We will release all materials necessary for reproducing our results.

CEJul 21, 2025
Interpreting CFD Surrogates through Sparse Autoencoders

Yeping Hu, Shusen Liu

Learning-based surrogate models have become a practical alternative to high-fidelity CFD solvers, but their latent representations remain opaque and hinder adoption in safety-critical or regulation-bound settings. This work introduces a posthoc interpretability framework for graph-based surrogate models used in computational fluid dynamics (CFD) by leveraging sparse autoencoders (SAEs). By obtaining an overcomplete basis in the node embedding space of a pretrained surrogate, the method extracts a dictionary of interpretable latent features. The approach enables the identification of monosemantic concepts aligned with physical phenomena such as vorticity or flow structures, offering a model-agnostic pathway to enhance explainability and trustworthiness in CFD applications.

LGJun 29, 2024
Enhancing Accuracy and Parameter-Efficiency of Neural Representations for Network Parameterization

Hongjun Choi, Jayaraman J. Thiagarajan, Ruben Glatt et al.

In this work, we investigate the fundamental trade-off regarding accuracy and parameter efficiency in the parameterization of neural network weights using predictor networks. We present a surprising finding that, when recovering the original model accuracy is the sole objective, it can be achieved effectively through the weight reconstruction objective alone. Additionally, we explore the underlying factors for improving weight reconstruction under parameter-efficiency constraints, and propose a novel training scheme that decouples the reconstruction objective from auxiliary objectives such as knowledge distillation that leads to significant improvements compared to state-of-the-art approaches. Finally, these results pave way for more practical scenarios, where one needs to achieve improvements on both model accuracy and predictor network parameter-efficiency simultaneously.

CLJun 24, 2024
Visualization Literacy of Multimodal Large Language Models: A Comparative Study

Zhimin Li, Haichao Miao, Valerio Pascucci et al.

The recent introduction of multimodal large language models (MLLMs) combine the inherent power of large language models (LLMs) with the renewed capabilities to reason about the multimodal context. The potential usage scenarios for MLLMs significantly outpace their text-only counterparts. Many recent works in visualization have demonstrated MLLMs' capability to understand and interpret visualization results and explain the content of the visualization to users in natural language. In the machine learning community, the general vision capabilities of MLLMs have been evaluated and tested through various visual understanding benchmarks. However, the ability of MLLMs to accomplish specific visualization tasks based on visual perception has not been properly explored and evaluated, particularly, from a visualization-centric perspective. In this work, we aim to fill the gap by utilizing the concept of visualization literacy to evaluate MLLMs. We assess MLLMs' performance over two popular visualization literacy evaluation datasets (VLAT and mini-VLAT). Under the framework of visualization literacy, we develop a general setup to compare different multimodal large language models (e.g., GPT4-o, Claude 3 Opus, Gemini 1.5 Pro) as well as against existing human baselines. Our study demonstrates MLLMs' competitive performance in visualization literacy, where they outperform humans in certain tasks such as identifying correlations, clusters, and hierarchical structures.

LGJun 25, 2021
Reliable Graph Neural Network Explanations Through Adversarial Training

Donald Loveland, Shusen Liu, Bhavya Kailkhura et al.

Graph neural network (GNN) explanations have largely been facilitated through post-hoc introspection. While this has been deemed successful, many post-hoc explanation methods have been shown to fail in capturing a model's learned representation. Due to this problem, it is worthwhile to consider how one might train a model so that it is more amenable to post-hoc analysis. Given the success of adversarial training in the computer vision domain to train models with more reliable representations, we propose a similar training paradigm for GNNs and analyze the respective impact on a model's explanations. In instances without ground truth labels, we also determine how well an explanation method is utilizing a model's learned representation through a new metric and demonstrate adversarial training can help better extract domain-relevant insights in chemistry.

LGJul 16, 2020
Explainable Deep Learning for Uncovering Actionable Scientific Insights for Materials Discovery and Design

Shusen Liu, Bhavya Kailkhura, Jize Zhang et al.

The scientific community has been increasingly interested in harnessing the power of deep learning to solve various domain challenges. However, despite the effectiveness in building predictive models, fundamental challenges exist in extracting actionable knowledge from deep neural networks due to their opaque nature. In this work, we propose techniques for exploring the behavior of deep learning models by injecting domain-specific actionable attributes as tunable "knobs" in the analysis pipeline. By incorporating the domain knowledge in a generative modeling framework, we are not only able to better understand the behavior of these black-box models, but also provide scientists with actionable insights that can potentially lead to fundamental discoveries.

CVJun 30, 2020
Actionable Attribution Maps for Scientific Machine Learning

Shusen Liu, Bhavya Kailkhura, Jize Zhang et al.

The scientific community has been increasingly interested in harnessing the power of deep learning to solve various domain challenges. However, despite the effectiveness in building predictive models, fundamental challenges exist in extracting actionable knowledge from the deep neural network due to their opaque nature. In this work, we propose techniques for exploring the behavior of deep learning models by injecting domain-specific actionable concepts as tunable ``knobs'' in the analysis pipeline. By incorporating the domain knowledge with generative modeling, we are not only able to better understand the behavior of these black-box models, but also provide scientists with actionable insights that can potentially lead to fundamental discoveries.

DCOct 5, 2019
Parallelizing Training of Deep Generative Models on Massive Scientific Datasets

Sam Ade Jacobs, Brian Van Essen, David Hysom et al.

Training deep neural networks on large scientific data is a challenging task that requires enormous compute power, especially if no pre-trained models exist to initialize the process. We present a novel tournament method to train traditional as well as generative adversarial networks built on LBANN, a scalable deep learning framework optimized for HPC systems. LBANN combines multiple levels of parallelism and exploits some of the worlds largest supercomputers. We demonstrate our framework by creating a complex predictive model based on multi-variate data from high-energy-density physics containing hundreds of millions of images and hundreds of millions of scalar values derived from tens of millions of simulations of inertial confinement fusion. Our approach combines an HPC workflow and extends LBANN with optimized data ingestion and the new tournament-style training algorithm to produce a scalable neural network architecture using a CORAL-class supercomputer. Experimental results show that 64 trainers (1024 GPUs) achieve a speedup of 70.2 over a single trainer (16 GPUs) baseline, and an effective 109% parallel efficiency.

COMP-PHOct 3, 2019
Exploring Generative Physics Models with Scientific Priors in Inertial Confinement Fusion

Rushil Anirudh, Jayaraman J. Thiagarajan, Shusen Liu et al.

There is significant interest in using modern neural networks for scientific applications due to their effectiveness in modeling highly complex, non-linear problems in a data-driven fashion. However, a common challenge is to verify the scientific plausibility or validity of outputs predicted by a neural network. This work advocates the use of known scientific constraints as a lens into evaluating, exploring, and understanding such predictions for the problem of inertial confinement fusion.

LGSep 25, 2019
Function Preserving Projection for Scalable Exploration of High-Dimensional Data

Shusen Liu, Rushil Anirudh, Jayaraman J. Thiagarajan et al.

We present function preserving projections (FPP), a scalable linear projection technique for discovering interpretable relationships in high-dimensional data. Conventional dimension reduction methods aim to maximally preserve the global and/or local geometric structure of a dataset. However, in practice one is often more interested in determining how one or multiple user-selected response function(s) can be explained by the data. To intuitively connect the responses to the data, FPP constructs 2D linear embeddings optimized to reveal interpretable yet potentially non-linear patterns of the response functions. More specifically, FPP is designed to (i) produce human-interpretable embeddings; (ii) capture non-linear relationships; (iii) allow the simultaneous use of multiple response functions; and (iv) scale to millions of samples. Using FPP on real-world datasets, one can obtain fundamentally new insights about high-dimensional relationships in large-scale data that could not be achieved using existing dimension reduction methods.

LGJul 19, 2019
Scalable Topological Data Analysis and Visualization for Evaluating Data-Driven Models in Scientific Applications

Shusen Liu, Di Wang, Dan Maljovec et al.

With the rapid adoption of machine learning techniques for large-scale applications in science and engineering comes the convergence of two grand challenges in visualization. First, the utilization of black box models (e.g., deep neural networks) calls for advanced techniques in exploring and interpreting model behaviors. Second, the rapid growth in computing has produced enormous datasets that require techniques that can handle millions or more samples. Although some solutions to these interpretability challenges have been proposed, they typically do not scale beyond thousands of samples, nor do they provide the high-level intuition scientists are looking for. Here, we present the first scalable solution to explore and analyze high-dimensional functions often encountered in the scientific data analysis pipeline. By combining a new streaming neighborhood graph construction, the corresponding topology computation, and a novel data aggregation scheme, namely topology aware datacubes, we enable interactive exploration of both the topological and the geometric aspect of high-dimensional data. Following two use cases from high-energy-density (HED) physics and computational biology, we demonstrate how these capabilities have led to crucial new insights in both applications.

LGJul 6, 2019
Generative Counterfactual Introspection for Explainable Deep Learning

Shusen Liu, Bhavya Kailkhura, Donald Loveland et al.

In this work, we propose an introspection technique for deep neural networks that relies on a generative model to instigate salient editing of the input image for model interpretation. Such modification provides the fundamental interventional operation that allows us to obtain answers to counterfactual inquiries, i.e., what meaningful change can be made to the input image in order to alter the prediction. We demonstrate how to reveal interesting properties of the given classifiers by utilizing the proposed introspection approach on both the MNIST and the CelebA dataset.

MLDec 19, 2017
Exploring High-Dimensional Structure via Axis-Aligned Decomposition of Linear Projections

Jayaraman J. Thiagarajan, Shusen Liu, Karthikeyan Natesan Ramamurthy et al.

Two-dimensional embeddings remain the dominant approach to visualize high dimensional data. The choice of embeddings ranges from highly non-linear ones, which can capture complex relationships but are difficult to interpret quantitatively, to axis-aligned projections, which are easy to interpret but are limited to bivariate relationships. Linear project can be considered as a compromise between complexity and interpretability, as they allow explicit axes labels, yet provide significantly more degrees of freedom compared to axis-aligned projections. Nevertheless, interpreting the axes directions, which are linear combinations often with many non-trivial components, remains difficult. To address this problem we introduce a structure aware decomposition of (multiple) linear projections into sparse sets of axis aligned projections, which jointly capture all information of the original linear ones. In particular, we use tools from Dempster-Shafer theory to formally define how relevant a given axis aligned project is to explain the neighborhood relations displayed in some linear projection. Furthermore, we introduce a new approach to discover a diverse set of high quality linear projections and show that in practice the information of $k$ linear projections is often jointly encoded in $\sim k$ axis aligned plots. We have integrated these ideas into an interactive visualization system that allows users to jointly browse both linear projections and their axis aligned representatives. Using a number of case studies we show how the resulting plots lead to more intuitive visualizations and new insight.