Xiaoye Li

CV
h-index50
5papers
110citations
Novelty50%
AI Score45

5 Papers

CVMay 11, 2022Code
AggPose: Deep Aggregation Vision Transformer for Infant Pose Estimation

Xu Cao, Xiaoye Li, Liya Ma et al.

Movement and pose assessment of newborns lets experienced pediatricians predict neurodevelopmental disorders, allowing early intervention for related diseases. However, most of the newest AI approaches for human pose estimation methods focus on adults, lacking publicly benchmark for infant pose estimation. In this paper, we fill this gap by proposing infant pose dataset and Deep Aggregation Vision Transformer for human pose estimation, which introduces a fast trained full transformer framework without using convolution operations to extract features in the early stages. It generalizes Transformer + MLP to high-resolution deep layer aggregation within feature maps, thus enabling information fusion between different vision levels. We pre-train AggPose on COCO pose dataset and apply it on our newly released large-scale infant pose estimation dataset. The results show that AggPose could effectively learn the multi-scale features among different resolutions and significantly improve the performance of infant pose estimation. We show that AggPose outperforms hybrid model HRFormer and TokenPose in the infant pose estimation dataset. Moreover, our AggPose outperforms HRFormer by 0.8 AP on COCO val pose estimation on average. Our code is available at github.com/SZAR-LAB/AggPose.

DCMar 14, 2021
TRUST: Triangle Counting Reloaded on GPUs

Santosh Pandey, Zhibin Wang, Sheng Zhong et al.

Triangle counting is a building block for a wide range of graph applications. Traditional wisdom suggests that i) hashing is not suitable for triangle counting, ii) edge-centric triangle counting beats vertex-centric design, and iii) communication-free and workload balanced graph partitioning is a grand challenge for triangle counting. On the contrary, we advocate that i) hashing can help the key operations for scalable triangle counting on Graphics Processing Units (GPUs), i.e., list intersection and graph partitioning, ii)vertex-centric design reduces both hash table construction cost and memory consumption, which is limited on GPUs. In addition, iii) we exploit graph and workload collaborative, and hashing-based 2D partitioning to scale vertex-centric triangle counting over 1,000 GPUswith sustained scalability. In this work, we present TRUST which performs triangle counting with the hash operation and vertex-centric mechanism at the core. To the best of our knowledge, TRUSTis the first work that achieves over one trillion Traversed Edges Per Second (TEPS) rate for triangle counting.

CVOct 14, 2022
Semi-supervised Body Parsing and Pose Estimation for Enhancing Infant General Movement Assessment

Haomiao Ni, Yuan Xue, Liya Ma et al.

General movement assessment (GMA) of infant movement videos (IMVs) is an effective method for early detection of cerebral palsy (CP) in infants. We demonstrate in this paper that end-to-end trainable neural networks for image sequence recognition can be applied to achieve good results in GMA, and more importantly, augmenting raw video with infant body parsing and pose estimation information can significantly improve performance. To solve the problem of efficiently utilizing partially labeled IMVs for body parsing, we propose a semi-supervised model, termed SiamParseNet (SPN), which consists of two branches, one for intra-frame body parts segmentation and another for inter-frame label propagation. During training, the two branches are jointly trained by alternating between using input pairs of only labeled frames and input of both labeled and unlabeled frames. We also investigate training data augmentation by proposing a factorized video generative adversarial network (FVGAN) to synthesize novel labeled frames for training. When testing, we employ a multi-source inference mechanism, where the final result for a test frame is either obtained via the segmentation branch or via propagation from a nearby key frame. We conduct extensive experiments for body parsing using SPN on two infant movement video datasets, where SPN coupled with FVGAN achieves state-of-the-art performance. We further demonstrate that SPN can be easily adapted to the infant pose estimation task with superior performance. Last but not least, we explore the clinical application of our method for GMA. We collected a new clinical IMV dataset with GMA annotations, and our experiments show that SPN models for body parsing and pose estimation trained on the first two datasets generalize well to the new clinical dataset and their results can significantly boost the CRNN-based GMA prediction performance.

6.7AIMar 28
AstraAI: LLMs, Retrieval, and AST-Guided Assistance for HPC Codebases

Mahesh Natarajan, Xiaoye Li, Weiqun Zhang

We present AstraAI, a command-line interface (CLI) coding framework for high-performance computing (HPC) software development. AstraAI operates directly within a Linux terminal and integrates large language models (LLMs) with Retrieval-Augmented Generation (RAG) and Abstract Syntax Tree (AST)-based structural analysis to enable context-aware code generation for complex scientific codebases. The central idea is to construct a high-fidelity prompt that is passed to the LLM for inference. This prompt augments the user request with relevant code snippets retrieved from the underlying framework codebase via RAG and structural context extracted from AST analysis, providing the model with precise information about relevant functions, data structures, and overall code organization. The framework is designed to perform scoped modifications to source code while preserving structural consistency with the surrounding code. AstraAI supports both locally hosted models from Hugging Face and API-based frontier models accessible via the American Science Cloud, enabling flexible deployment across HPC environments. The system generates code that aligns with existing project structures and programming patterns. We demonstrate AstraAI on representative HPC code generation tasks within AMReX, a DOE-supported HPC software infrastructure for exascale applications.

HEP-LATSep 12, 2025
Matrix-free Neural Preconditioner for the Dirac Operator in Lattice Gauge Theory

Yixuan Sun, Srinivas Eswar, Yin Lin et al.

Linear systems arise in generating samples and in calculating observables in lattice quantum chromodynamics~(QCD). Solving the Hermitian positive definite systems, which are sparse but ill-conditioned, involves using iterative methods, such as Conjugate Gradient (CG), which are time-consuming and computationally expensive. Preconditioners can effectively accelerate this process, with the state-of-the-art being multigrid preconditioners. However, constructing useful preconditioners can be challenging, adding additional computational overhead, especially in large linear systems. We propose a framework, leveraging operator learning techniques, to construct linear maps as effective preconditioners. The method in this work does not rely on explicit matrices from either the original linear systems or the produced preconditioners, allowing efficient model training and application in the CG solver. In the context of the Schwinger model U(1) gauge theory in 1+1 spacetime dimensions with two degenerate-mass fermions), this preconditioning scheme effectively decreases the condition number of the linear systems and approximately halves the number of iterations required for convergence in relevant parameter ranges. We further demonstrate the framework learns a general mapping dependent on the lattice structure which leads to zero-shot learning ability for the Dirac operators constructed from gauge field configurations of different sizes.