h-index30
8papers
48citations
Novelty59%
AI Score53

8 Papers

IVJul 16, 2023Code
Adaptively Placed Multi-Grid Scene Representation Networks for Large-Scale Data Visualization

Skylar Wolfgang Wurster, Tianyu Xiong, Han-Wei Shen et al.

Scene representation networks (SRNs) have been recently proposed for compression and visualization of scientific data. However, state-of-the-art SRNs do not adapt the allocation of available network parameters to the complex features found in scientific data, leading to a loss in reconstruction quality. We address this shortcoming with an adaptively placed multi-grid SRN (APMGSRN) and propose a domain decomposition training and inference technique for accelerated parallel training on multi-GPU systems. We also release an open-source neural volume rendering application that allows plug-and-play rendering with any PyTorch-based SRN. Our proposed APMGSRN architecture uses multiple spatially adaptive feature grids that learn where to be placed within the domain to dynamically allocate more neural network resources where error is high in the volume, improving state-of-the-art reconstruction accuracy of SRNs for scientific data without requiring expensive octree refining, pruning, and traversal like previous adaptive models. In our domain decomposition approach for representing large-scale data, we train an set of APMGSRNs in parallel on separate bricks of the volume to reduce training time while avoiding overhead necessary for an out-of-core solution for volumes too large to fit in GPU memory. After training, the lightweight SRNs are used for realtime neural volume rendering in our open-source renderer, where arbitrary view angles and transfer functions can be explored. A copy of this paper, all code, all models used in our experiments, and all supplemental materials and videos are available at https://github.com/skywolf829/APMGSRN.

LGJul 26, 2024
Regularized Multi-Decoder Ensemble for an Error-Aware Scene Representation Network

Tianyu Xiong, Skylar W. Wurster, Hanqi Guo et al.

Feature grid Scene Representation Networks (SRNs) have been applied to scientific data as compact functional surrogates for analysis and visualization. As SRNs are black-box lossy data representations, assessing the prediction quality is critical for scientific visualization applications to ensure that scientists can trust the information being visualized. Currently, existing architectures do not support inference time reconstruction quality assessment, as coordinate-level errors cannot be evaluated in the absence of ground truth data. We propose a parameter-efficient multi-decoder SRN (MDSRN) ensemble architecture consisting of a shared feature grid with multiple lightweight multi-layer perceptron decoders. MDSRN can generate a set of plausible predictions for a given input coordinate to compute the mean as the prediction of the multi-decoder ensemble and the variance as a confidence score. The coordinate-level variance can be rendered along with the data to inform the reconstruction quality, or be integrated into uncertainty-aware volume visualization algorithms. To prevent the misalignment between the quantified variance and the prediction quality, we propose a novel variance regularization loss for ensemble learning that promotes the Regularized multi-decoder SRN (RMDSRN) to obtain a more reliable variance that correlates closely to the true model error. We comprehensively evaluate the quality of variance quantification and data reconstruction of Monte Carlo Dropout, Mean Field Variational Inference, Deep Ensemble, and Predicting Variance compared to the proposed MDSRN and RMDSRN across diverse scalar field datasets. We demonstrate that RMDSRN attains the most accurate data reconstruction and competitive variance-error correlation among uncertain SRNs under the same neural network parameter budgets.

LGFeb 16
Refine Now, Query Fast: A Decoupled Refinement Paradigm for Implicit Neural Fields

Tianyu Xiong, Skylar Wurster, Han-Wei Shen

Implicit Neural Representations (INRs) have emerged as promising surrogates for large 3D scientific simulations due to their ability to continuously model spatial and conditional fields, yet they face a critical fidelity-speed dilemma: deep MLPs suffer from high inference cost, while efficient embedding-based models lack sufficient expressiveness. To resolve this, we propose the Decoupled Representation Refinement (DRR) architectural paradigm. DRR leverages a deep refiner network, alongside non-parametric transformations, in a one-time offline process to encode rich representations into a compact and efficient embedding structure. This approach decouples slow neural networks with high representational capacity from the fast inference path. We introduce DRR-Net, a simple network that validates this paradigm, and a novel data augmentation strategy, Variational Pairs (VP) for improving INRs under complex tasks like high-dimensional surrogate modeling. Experiments on several ensemble simulation datasets demonstrate that our approach achieves state-of-the-art fidelity, while being up to 27$\times$ faster at inference than high-fidelity baselines and remaining competitive with the fastest models. The DRR paradigm offers an effective strategy for building powerful and practical neural field surrogates and \rev{INRs in broader applications}, with a minimal compromise between speed and quality.

63.3LGMar 31
FA-INR: Adaptive Implicit Neural Representations for Interpretable Exploration of Simulation Ensembles

Ziwei Li, Yuhan Duan, Tianyu Xiong et al.

Surrogate models are essential for efficient exploration of large-scale ensemble simulations. Implicit neural representations (INRs) provide a compact and continuous framework for modeling spatially structured data, but they often struggle with learning complex localized structures within the scientific fields. Recent INR-based surrogates address this by augmenting INRs with explicit feature structures, but at the cost of flexibility and substantial memory overhead. In this paper, we present Feature-Adaptive INR (FA-INR), an adaptive INR-based surrogate model for high-fidelity and interpretable exploration of ensemble simulations. Instead of relying on structured feature representations, FA-INR leverages cross-attention over a learnable key-value memory bank to allocate model capacity adaptively based on the data characteristics. To further improve scalability, we introduce a coordinate-guided mixture of experts (MoE) framework that enhances both efficiency and specialization of feature representations. More importantly, the learned experts produce an interpretable partition over the simulation domain, enabling scientists to identify complex structures and perform localized parameter-space exploration. Beyond quantitative and qualitative evaluations, we also demonstrate that our learned expert specialization can reveal meaningful scientific insights and support localized sensitivity analysis.

LGDec 27, 2023
Refining Latent Homophilic Structures over Heterophilic Graphs for Robust Graph Convolution Networks

Chenyang Qiu, Guoshun Nan, Tianyu Xiong et al.

Graph convolution networks (GCNs) are extensively utilized in various graph tasks to mine knowledge from spatial data. Our study marks the pioneering attempt to quantitatively investigate the GCN robustness over omnipresent heterophilic graphs for node classification. We uncover that the predominant vulnerability is caused by the structural out-of-distribution (OOD) issue. This finding motivates us to present a novel method that aims to harden GCNs by automatically learning Latent Homophilic Structures over heterophilic graphs. We term such a methodology as LHS. To elaborate, our initial step involves learning a latent structure by employing a novel self-expressive technique based on multi-node interactions. Subsequently, the structure is refined using a pairwisely constrained dual-view contrastive learning approach. We iteratively perform the above procedure, enabling a GCN model to aggregate information in a homophilic way on heterophilic graphs. Armed with such an adaptable structure, we can properly mitigate the structural OOD threats over heterophilic graphs. Experiments on various benchmarks show the effectiveness of the proposed LHS approach for robust GCNs.

IRMar 8, 2024
Multi-Tower Multi-Interest Recommendation with User Representation Repel

Tianyu Xiong, Xiaohan Yu

In the era of information overload, the value of recommender systems has been profoundly recognized in academia and industry alike. Multi-interest sequential recommendation, in particular, is a subfield that has been receiving increasing attention in recent years. By generating multiple-user representations, multi-interest learning models demonstrate superior expressiveness than single-user representation models, both theoretically and empirically. Despite major advancements in the field, three major issues continue to plague the performance and adoptability of multi-interest learning methods, the difference between training and deployment objectives, the inability to access item information, and the difficulty of industrial adoption due to its single-tower architecture. We address these challenges by proposing a novel multi-tower multi-interest framework with user representation repel. Experimental results across multiple large-scale industrial datasets proved the effectiveness and generalizability of our proposed framework.

CVMar 5
GloSplat: Joint Pose-Appearance Optimization for Faster and More Accurate 3D Reconstruction

Tianyu Xiong, Rui Li, Linjie Li et al.

Feature extraction, matching, structure from motion (SfM), and novel view synthesis (NVS) have traditionally been treated as separate problems with independent optimization objectives. We present GloSplat, a framework that performs \emph{joint pose-appearance optimization} during 3D Gaussian Splatting training. Unlike prior joint optimization methods (BARF, NeRF--, 3RGS) that rely purely on photometric gradients for pose refinement, GloSplat preserves \emph{explicit SfM feature tracks} as first-class entities throughout training: track 3D points are maintained as separate optimizable parameters from Gaussian primitives, providing persistent geometric anchors via a reprojection loss that operates alongside photometric supervision. This architectural choice prevents early-stage pose drift while enabling fine-grained refinement -- a capability absent in photometric-only approaches. We introduce two pipeline variants: (1) \textbf{GloSplat-F}, a COLMAP-free variant using retrieval-based pair selection for efficient reconstruction, and (2) \textbf{GloSplat-A}, an exhaustive matching variant for maximum quality. Both employ global SfM initialization followed by joint photometric-geometric optimization during 3DGS training. Experiments demonstrate that GloSplat-F achieves state-of-the-art among COLMAP-free methods while GloSplat-A surpasses all COLMAP-based baselines.

CVAug 12, 2025
DiffPose-Animal: A Language-Conditioned Diffusion Framework for Animal Pose Estimation

Tianyu Xiong, Dayi Tan, Wei Tian

Animal pose estimation is a fundamental task in computer vision, with growing importance in ecological monitoring, behavioral analysis, and intelligent livestock management. Compared to human pose estimation, animal pose estimation is more challenging due to high interspecies morphological diversity, complex body structures, and limited annotated data. In this work, we introduce DiffPose-Animal, a novel diffusion-based framework for top-down animal pose estimation. Unlike traditional heatmap regression methods, DiffPose-Animal reformulates pose estimation as a denoising process under the generative framework of diffusion models. To enhance semantic guidance during keypoint generation, we leverage large language models (LLMs) to extract both global anatomical priors and local keypoint-wise semantics based on species-specific prompts. These textual priors are encoded and fused with image features via cross-attention modules to provide biologically meaningful constraints throughout the denoising process. Additionally, a diffusion-based keypoint decoder is designed to progressively refine pose predictions, improving robustness to occlusion and annotation sparsity. Extensive experiments on public animal pose datasets demonstrate the effectiveness and generalization capability of our method, especially under challenging scenarios with diverse species, cluttered backgrounds, and incomplete keypoints.