LGOct 12, 2022Code
Efficient Offline Policy Optimization with a Learned ModelZichen Liu, Siyi Li, Wee Sun Lee et al.
MuZero Unplugged presents a promising approach for offline policy learning from logged data. It conducts Monte-Carlo Tree Search (MCTS) with a learned model and leverages Reanalyze algorithm to learn purely from offline data. For good performance, MCTS requires accurate learned models and a large number of simulations, thus costing huge computing time. This paper investigates a few hypotheses where MuZero Unplugged may not work well under the offline RL settings, including 1) learning with limited data coverage; 2) learning from offline data of stochastic environments; 3) improperly parameterized models given the offline data; 4) with a low compute budget. We propose to use a regularized one-step look-ahead approach to tackle the above issues. Instead of planning with the expensive MCTS, we use the learned model to construct an advantage estimation based on a one-step rollout. Policy improvements are towards the direction that maximizes the estimated advantage with regularization of the dataset. We conduct extensive empirical studies with BSuite environments to verify the hypotheses and then run our algorithm on the RL Unplugged Atari benchmark. Experimental results show that our proposed approach achieves stable performance even with an inaccurate learned model. On the large-scale Atari benchmark, the proposed method outperforms MuZero Unplugged by 43%. Most significantly, it uses only 5.6% wall-clock time (i.e., 1 hour) compared to MuZero Unplugged (i.e., 17.8 hours) to achieve a 150% IQM normalized score with the same hardware and software stacks. Our implementation is open-sourced at https://github.com/sail-sg/rosmo.
LGNov 21, 2023
Learning to Optimise Wind Farms with Graph TransformersSiyi Li, Arnaud Robert, A. Aldo Faisal et al.
This work proposes a novel data-driven model capable of providing accurate predictions for the power generation of all wind turbines in wind farms of arbitrary layout, yaw angle configurations and wind conditions. The proposed model functions by encoding a wind farm into a fully-connected graph and processing the graph representation through a graph transformer. The graph transformer surrogate is shown to generalise well and is able to uncover latent structural patterns within the graph representation of wind farms. It is demonstrated how the resulting surrogate model can be used to optimise yaw angle configurations using genetic algorithms, achieving similar levels of accuracy to industrially-standard wind farm simulation tools while only taking a fraction of the computational cost.
LGNov 24, 2022
End-to-end Wind Turbine Wake Modelling with Deep Graph Representation LearningSiyi Li, Mingrui Zhang, Matthew D. Piggott
Wind turbine wake modelling is of crucial importance to accurate resource assessment, to layout optimisation, and to the operational control of wind farms. This work proposes a surrogate model for the representation of wind turbine wakes based on a state-of-the-art graph representation learning method termed a graph neural network. The proposed end-to-end deep learning model operates directly on unstructured meshes and has been validated against high-fidelity data, demonstrating its ability to rapidly make accurate 3D flow field predictions for various inlet conditions and turbine yaw angles. The specific graph neural network model employed here is shown to generalise well to unseen data and is less sensitive to over-smoothing compared to common graph neural networks. A case study based upon a real world wind farm further demonstrates the capability of the proposed approach to predict farm scale power generation. Moreover, the proposed graph neural network framework is flexible and highly generic and as formulated here can be applied to any steady state computational fluid dynamics simulations on unstructured meshes.
HCApr 24
AI-based experts' knowledge visualization of cultural heritage: A case study of Terracotta WarriorsSiyi Li, Yue Jiang, Bowen Jing et al.
Advancements in 3D modeling,digital display technologies,and the growing availability of digital cultural heritage data have significantly improved the accuracy of heritage depictions and expanded opportunities for analysis.However,while many studies focus on presenting specific cultural heritage figurines,an often overlooked aspect is the visualization of the Terracotta Warriors as a unified entity.This involves concisely representing the distribution of features and their relationships,providing a clear and insightful presentation that engages practitioners, academics,and wider audiences.To tackle the challenges mentioned above,this research seeks to explore the application of AI methods in processing cultural heritage data.It aims to optimize and augment the dataset,analyze the distribution and relationships of various attributes, and interpret the analysis results through visualization techniques.The Terracotta Warriors,among China's most significant cultural heritages and renowned for their abundance,exquisite workmanship,and magnitude,are chosen as a case study.The contribution of this paper is primarily twofold.Firstly,we constructed a dataset of Terracotta Warriors from Pit No.1,detailing the attributes significant for identifying different Terracotta Warriors.Secondly,we employ various AI methods,such as generative adversarial network and random forest,to process and analyze these attributes,followed by visualizing the analysis results for an intuitive presentation.This study introduces a novel scheme for presenting information on a collection of cultural relics,offering a practical case for analyzing and visualizing the Terracotta Warriors'attributes as a whole entity,rather than showcasing individual relics'information in isolation.
NCMar 31
Predicting Neuromodulation Outcome for Parkinson's Disease with Generative Virtual Brain ModelSiyuan Du, Siyi Li, Shuwei Bai et al.
Parkinson's disease (PD) affects over ten million people worldwide. Although temporal interference (TI) and deep brain stimulation (DBS) are promising therapies, inter-individual variability limits empirical treatment selection, increasing non-negligible surgical risk and cost. Previous explorations either resort to limited statistical biomarkers that are insufficient to characterize variability, or employ AI-driven methods which is prone to overfitting and opacity. We bridge this gap with a pretraining-finetuning framework to predict outcomes directly from resting-state fMRI. Critically, a generative virtual brain foundation model, pretrained on a collective dataset (2707 subjects, 5621 sessions) to capture universal disorder patterns, was finetuned on PD cohorts receiving TI (n=51) or DBS (n=55) to yield individualized virtual brains with high fidelity to empirical functional connectivity (r=0.935). By constructing counterfactual estimations between pathological and healthy neural states within these personalized models, we predicted clinical responses (TI: AUPR=0.853; DBS: AUPR=0.915), substantially outperforming baselines. External and prospective validations (n=14, n=11) highlight the feasibility of clinical translation. Moreover, our framework provides state-dependent regional patterns linked to response, offering hypothesis-generating mechanistic insights.
LGJan 14
Draw it like Euclid: Teaching transformer models to generate CAD profiles using ruler and compass construction stepsSiyi Li, Joseph G. Lambourne, Longfei Zhang et al.
We introduce a new method of generating Computer Aided Design (CAD) profiles via a sequence of simple geometric constructions including curve offsetting, rotations and intersections. These sequences start with geometry provided by a designer and build up the points and curves of the final profile step by step. We demonstrate that adding construction steps between the designer's input geometry and the final profile improves generation quality in a similar way to the introduction of a chain of thought in language models. Similar to the constraints in a parametric CAD model, the construction sequences reduce the degrees of freedom in the modeled shape to a small set of parameter values which can be adjusted by the designer, allowing parametric editing with the constructed geometry evaluated to floating point precision. In addition we show that applying reinforcement learning to the construction sequences gives further improvements over a wide range of metrics, including some which were not explicitly optimized.
CVNov 24, 2024Code
ROOT: VLM based System for Indoor Scene Understanding and BeyondYonghui Wang, Shi-Yong Chen, Zhenxing Zhou et al.
Recently, Vision Language Models (VLMs) have experienced significant advancements, yet these models still face challenges in spatial hierarchical reasoning within indoor scenes. In this study, we introduce ROOT, a VLM-based system designed to enhance the analysis of indoor scenes. Specifically, we first develop an iterative object perception algorithm using GPT-4V to detect object entities within indoor scenes. This is followed by employing vision foundation models to acquire additional meta-information about the scene, such as bounding boxes. Building on this foundational data, we propose a specialized VLM, SceneVLM, which is capable of generating spatial hierarchical scene graphs and providing distance information for objects within indoor environments. This information enhances our understanding of the spatial arrangement of indoor scenes. To train our SceneVLM, we collect over 610,000 images from various public indoor datasets and implement a scene data generation pipeline with a semi-automated technique to establish relationships and estimate distances among indoor objects. By utilizing this enriched data, we conduct various training recipes and finish SceneVLM. Our experiments demonstrate that \rootname facilitates indoor scene understanding and proves effective in diverse downstream applications, such as 3D scene generation and embodied AI. The code will be released at \url{https://github.com/harrytea/ROOT}.
LGFeb 13, 2025
Machine learning for modelling unstructured grid data in computational physics: a reviewSibo Cheng, Marc Bocquet, Weiping Ding et al.
Unstructured grid data are essential for modelling complex geometries and dynamics in computational physics. Yet, their inherent irregularity presents significant challenges for conventional machine learning (ML) techniques. This paper provides a comprehensive review of advanced ML methodologies designed to handle unstructured grid data in high-dimensional dynamical systems. Key approaches discussed include graph neural networks, transformer models with spatial attention mechanisms, interpolation-integrated ML methods, and meshless techniques such as physics-informed neural networks. These methodologies have proven effective across diverse fields, including fluid dynamics and environmental simulations. This review is intended as a guidebook for computational scientists seeking to apply ML approaches to unstructured grid data in their domains, as well as for ML researchers looking to address challenges in computational physics. It places special focus on how ML methods can overcome the inherent limitations of traditional numerical techniques and, conversely, how insights from computational physics can inform ML development. To support benchmarking, this review also provides a summary of open-access datasets of unstructured grid data in computational physics. Finally, emerging directions such as generative models with unstructured data, reinforcement learning for mesh generation, and hybrid physics-data-driven paradigms are discussed to inspire future advancements in this evolving field.
CLAug 21, 2025
A Survey on Large Language Model BenchmarksShiwen Ni, Guhong Chen, Shuaimin Li et al.
In recent years, with the rapid development of the depth and breadth of large language models' capabilities, various corresponding evaluation benchmarks have been emerging in increasing numbers. As a quantitative assessment tool for model performance, benchmarks are not only a core means to measure model capabilities but also a key element in guiding the direction of model development and promoting technological innovation. We systematically review the current status and development of large language model benchmarks for the first time, categorizing 283 representative benchmarks into three categories: general capabilities, domain-specific, and target-specific. General capability benchmarks cover aspects such as core linguistics, knowledge, and reasoning; domain-specific benchmarks focus on fields like natural sciences, humanities and social sciences, and engineering technology; target-specific benchmarks pay attention to risks, reliability, agents, etc. We point out that current benchmarks have problems such as inflated scores caused by data contamination, unfair evaluation due to cultural and linguistic biases, and lack of evaluation on process credibility and dynamic environments, and provide a referable design paradigm for future benchmark innovation.
LGMar 17, 2025
Agents Play Thousands of 3D Video GamesZhongwen Xu, Xianliang Wang, Siyi Li et al.
We present PORTAL, a novel framework for developing artificial intelligence agents capable of playing thousands of 3D video games through language-guided policy generation. By transforming decision-making problems into language modeling tasks, our approach leverages large language models (LLMs) to generate behavior trees represented in domain-specific language (DSL). This method eliminates the computational burden associated with traditional reinforcement learning approaches while preserving strategic depth and rapid adaptability. Our framework introduces a hybrid policy structure that combines rule-based nodes with neural network components, enabling both high-level strategic reasoning and precise low-level control. A dual-feedback mechanism incorporating quantitative game metrics and vision-language model analysis facilitates iterative policy improvement at both tactical and strategic levels. The resulting policies are instantaneously deployable, human-interpretable, and capable of generalizing across diverse gaming environments. Experimental results demonstrate PORTAL's effectiveness across thousands of first-person shooter (FPS) games, showcasing significant improvements in development efficiency, policy generalization, and behavior diversity compared to traditional approaches. PORTAL represents a significant advancement in game AI development, offering a practical solution for creating sophisticated agents that can operate across thousands of commercial video games with minimal development overhead. Experiment results on the 3D video games are best viewed on https://zhongwen.one/projects/portal .
LGAug 26, 2025
Predicting Drug-Drug Interactions Using Heterogeneous Graph Neural Networks: HGNN-DDIHongbo Liu, Siyi Li, Zheng Yu
Drug-drug interactions (DDIs) are a major concern in clinical practice, as they can lead to reduced therapeutic efficacy or severe adverse effects. Traditional computational approaches often struggle to capture the complex relationships among drugs, targets, and biological entities. In this work, we propose HGNN-DDI, a heterogeneous graph neural network model designed to predict potential DDIs by integrating multiple drug-related data sources. HGNN-DDI leverages graph representation learning to model heterogeneous biomedical networks, enabling effective information propagation across diverse node and edge types. Experimental results on benchmark DDI datasets demonstrate that HGNN-DDI outperforms state-of-the-art baselines in prediction accuracy and robustness, highlighting its potential to support safer drug development and precision medicine.
AIMar 7
CoTJudger: A Graph-Driven Framework for Automatic Evaluation of Chain-of-Thought Efficiency and Redundancy in LRMsSiyi Li, Jiajun Shi, Shiwen Ni et al.
Large Reasoning Models (LRMs) have demonstrated strong performance by producing extended Chain-of-Thought (CoT) traces before answering. However, this paradigm often induces over-reasoning: redundant calculations and circular self-verification that increase computational cost without improving outcomes. Existing evaluations largely emphasize final accuracy or coarse token counts, and lack automated tools to separate essential logic from structural redundancy. We introduce CoTJudger, a graph-driven framework that quantifies reasoning efficiency by converting free-form CoTs into directed dependency graphs and extracting the Shortest Effective Path (SEP) needed to reach a correct solution. This yields an interpretable efficiency signal -- how much of a CoT is necessary versus structurally redundant -- that is comparable across models and tasks. Evaluating 21 LRMs, CoTJudger reveals pervasive redundancy and surfaces recurring failure modes, including verification obsession and compensatory redundancy. These results provide a practical metric for disentangling reasoning ability from computational waste, enabling more targeted evaluation and diagnosis of LRM efficiency.
CVNov 23, 2025
UniFlow: Towards Zero-Shot LiDAR Scene Flow for Autonomous Vehicles via Cross-Domain GeneralizationSiyi Li, Qingwen Zhang, Ishan Khatri et al.
LiDAR scene flow is the task of estimating per-point 3D motion between consecutive point clouds. Recent methods achieve centimeter-level accuracy on popular autonomous vehicle (AV) datasets, but are typically only trained and evaluated on a single sensor. In this paper, we aim to learn general motion priors that transfer to diverse and unseen LiDAR sensors. However, prior work in LiDAR semantic segmentation and 3D object detection demonstrate that naively training on multiple datasets yields worse performance than single dataset models. Interestingly, we find that this conventional wisdom does not hold for motion estimation, and that state-of-the-art scene flow methods greatly benefit from cross-dataset training. We posit that low-level tasks such as motion estimation may be less sensitive to sensor configuration; indeed, our analysis shows that models trained on fast-moving objects (e.g., from highway datasets) perform well on fast-moving objects, even across different datasets. Informed by our analysis, we propose UniFlow, a family of feedforward models that unifies and trains on multiple large-scale LiDAR scene flow datasets with diverse sensor placements and point cloud densities. Our frustratingly simple solution establishes a new state-of-the-art on Waymo and nuScenes, improving over prior work by 5.1% and 35.2% respectively. Moreover, UniFlow achieves state-of-the-art accuracy on unseen datasets like TruckScenes, outperforming prior TruckScenes-specific models by 30.1%.
LGMar 15, 2025
Eval-PPO: Building an Efficient Threat Evaluator Using Proximal Policy OptimizationWuzhou Sun, Siyi Li, Qingxiang Zou et al.
In various game scenarios, selecting a fixed number of targets from multiple enemy units is an extremely challenging task. This difficulty stems from the complex relationship between the threat levels of enemy units and their feature characteristics, which complicates the design of rule-based evaluators. Moreover, traditional supervised learning methods face the challenge of lacking explicit labels during training when applied to this threat evaluation problem. In this study, we redefine the threat evaluation problem as a reinforcement learning task and introduce an efficient evaluator training algorithm, Eval-PPO, based on the Proximal Policy Optimization (PPO) algorithm. Eval-PPO integrates multidimensional enemy features and the state information of friendly units through systematic training, thereby achieving precise threat assessment. Compared with rule-based methods, Eval-PPO demonstrates a significant improvement in average success rate, with an increase of 17.84%.
LGMar 4, 2025
Reinforcement Learning-based Threat AssessmentWuzhou Sun, Siyi Li, Qingxiang Zou et al.
In some game scenarios, due to the uncertainty of the number of enemy units and the priority of various attributes, the evaluation of the threat level of enemy units as well as the screening has been a challenging research topic, and the core difficulty lies in how to reasonably set the priority of different attributes in order to achieve quantitative evaluation of the threat. In this paper, we innovatively transform the problem of threat assessment into a reinforcement learning problem, and through systematic reinforcement learning training, we successfully construct an efficient neural network evaluator. The evaluator can not only comprehensively integrate the multidimensional attribute features of the enemy, but also effectively combine our state information, thus realizing a more accurate and scientific threat assessment.
NAJun 29, 2024
Towards Universal Mesh Movement NetworksMingrui Zhang, Chunyang Wang, Stephan Kramer et al.
Solving complex Partial Differential Equations (PDEs) accurately and efficiently is an essential and challenging problem in all scientific and engineering disciplines. Mesh movement methods provide the capability to improve the accuracy of the numerical solution without increasing the overall mesh degree of freedom count. Conventional sophisticated mesh movement methods are extremely expensive and struggle to handle scenarios with complex boundary geometries. However, existing learning-based methods require re-training from scratch given a different PDE type or boundary geometry, which limits their applicability, and also often suffer from robustness issues in the form of inverted elements. In this paper, we introduce the Universal Mesh Movement Network (UM2N), which -- once trained -- can be applied in a non-intrusive, zero-shot manner to move meshes with different size distributions and structures, for solvers applicable to different PDE types and boundary geometries. UM2N consists of a Graph Transformer (GT) encoder for extracting features and a Graph Attention Network (GAT) based decoder for moving the mesh. We evaluate our method on advection and Navier-Stokes based examples, as well as a real-world tsunami simulation case. Our method outperforms existing learning-based mesh movement methods in terms of the benchmarks described above. In comparison to the conventional sophisticated Monge-Ampère PDE-solver based method, our approach not only significantly accelerates mesh movement, but also proves effective in scenarios where the conventional method fails. Our project page is at https://erizmr.github.io/UM2N/.
CVDec 1, 2021
Confidence Propagation Cluster: Unleash Full Potential of Object DetectorsYichun Shen, Wanli Jiang, Zhen Xu et al.
It has been a long history that most object detection methods obtain objects by using the non-maximum suppression (NMS) and its improved versions like Soft-NMS to remove redundant bounding boxes. We challenge those NMS-based methods from three aspects: 1) The bounding box with highest confidence value may not be the true positive having the biggest overlap with the ground-truth box. 2) Not only suppression is required for redundant boxes, but also confidence enhancement is needed for those true positives. 3) Sorting candidate boxes by confidence values is not necessary so that full parallelism is achievable. In this paper, inspired by belief propagation (BP), we propose the Confidence Propagation Cluster (CP-Cluster) to replace NMS-based methods, which is fully parallelizable as well as better in accuracy. In CP-Cluster, we borrow the message passing mechanism from BP to penalize redundant boxes and enhance true positives simultaneously in an iterative way until convergence. We verified the effectiveness of CP-Cluster by applying it to various mainstream detectors such as FasterRCNN, SSD, FCOS, YOLOv3, YOLOv5, Centernet etc. Experiments on MS COCO show that our plug and play method, without retraining detectors, is able to steadily improve average mAP of all those state-of-the-art models with a clear margin from 0.3 to 1.9 respectively when compared with NMS-based methods.
CVOct 27, 2021
From Image to Imuge: Immunized Image GenerationQichao Ying, Zhenxing Qian, Hang Zhou et al.
We introduce Imuge, an image tamper resilient generative scheme for image self-recovery. The traditional manner of concealing image content within the image are inflexible and fragile to diverse digital attack, i.e. image cropping and JPEG compression. To address this issue, we jointly train a U-Net backboned encoder, a tamper localization network and a decoder for image recovery. Given an original image, the encoder produces a visually indistinguishable immunized image. At the recipient's side, the verifying network localizes the malicious modifications, and the original content can be approximately recovered by the decoder, despite the presence of the attacks. Several strategies are proposed to boost the training efficiency. We demonstrate that our method can recover the details of the tampered regions with a high quality despite the presence of various kinds of attacks. Comprehensive ablation studies are conducted to validate our network designs.
AISep 24, 2017
Learning Unmanned Aerial Vehicle Control for Autonomous Target FollowingSiyi Li, Tianbo Liu, Chi Zhang et al.
While deep reinforcement learning (RL) methods have achieved unprecedented successes in a range of challenging problems, their applicability has been mainly limited to simulation or game domains due to the high sample complexity of the trial-and-error learning process. However, real-world robotic applications often need a data-efficient learning process with safety-critical constraints. In this paper, we consider the challenging problem of learning unmanned aerial vehicle (UAV) control for tracking a moving target. To acquire a strategy that combines perception and control, we represent the policy by a convolutional neural network. We develop a hierarchical approach that combines a model-free policy gradient method with a conventional feedback proportional-integral-derivative (PID) controller to enable stable learning without catastrophic failure. The neural network is trained by a combination of supervised learning from raw images and reinforcement learning from games of self-play. We show that the proposed approach can learn a target following policy in a simulator efficiently and the learned behavior can be successfully transferred to the DJI quadrotor platform for real-world UAV control.
CVJan 19, 2015
Transferring Rich Feature Hierarchies for Robust Visual TrackingNaiyan Wang, Siyi Li, Abhinav Gupta et al.
Convolutional neural network (CNN) models have demonstrated great success in various computer vision tasks including image classification and object detection. However, some equally important tasks such as visual tracking remain relatively unexplored. We believe that a major hurdle that hinders the application of CNN to visual tracking is the lack of properly labeled training data. While existing applications that liberate the power of CNN often need an enormous amount of training data in the order of millions, visual tracking applications typically have only one labeled example in the first frame of each video. We address this research issue here by pre-training a CNN offline and then transferring the rich feature hierarchies learned to online tracking. The CNN is also fine-tuned during online tracking to adapt to the appearance of the tracked target specified in the first video frame. To fit the characteristics of object tracking, we first pre-train the CNN to recognize what is an object, and then propose to generate a probability map instead of producing a simple class label. Using two challenging open benchmarks for performance evaluation, our proposed tracker has demonstrated substantial improvement over other state-of-the-art trackers.