Haorui Li

LG
h-index4
7papers
24citations
Novelty54%
AI Score54

7 Papers

LGJan 9
AIConfigurator: Lightning-Fast Configuration Optimization for Multi-Framework LLM Serving

Tianhao Xu, Yiming Liu, Xianglong Lu et al.

Optimizing Large Language Model (LLM) inference in production systems is increasingly difficult due to dynamic workloads, stringent latency/throughput targets, and a rapidly expanding configuration space. This complexity spans not only distributed parallelism strategies (tensor/pipeline/expert) but also intricate framework-specific runtime parameters such as those concerning the enablement of CUDA graphs, available KV-cache memory fractions, and maximum token capacity, which drastically impact performance. The diversity of modern inference frameworks (e.g., TRT-LLM, vLLM, SGLang), each employing distinct kernels and execution policies, makes manual tuning both framework-specific and computationally prohibitive. We present AIConfigurator, a unified performance-modeling system that enables rapid, framework-agnostic inference configuration search without requiring GPU-based profiling. AIConfigurator combines (1) a methodology that decomposes inference into analytically modelable primitives - GEMM, attention, communication, and memory operations while capturing framework-specific scheduling dynamics; (2) a calibrated kernel-level performance database for these primitives across a wide range of hardware platforms and popular open-weights models (GPT-OSS, Qwen, DeepSeek, LLama, Mistral); and (3) an abstraction layer that automatically resolves optimal launch parameters for the target backend, seamlessly integrating into production-grade orchestration systems. Evaluation on production LLM serving workloads demonstrates that AIConfigurator identifies superior serving configurations that improve performance by up to 40% for dense models (e.g., Qwen3-32B) and 50% for MoE architectures (e.g., DeepSeek-V3), while completing searches within 30 seconds on average. Enabling the rapid exploration of vast design spaces - from cluster topology down to engine specific flags.

LGAug 2, 2023
Wasserstein Diversity-Enriched Regularizer for Hierarchical Reinforcement Learning

Haorui Li, Jiaqi Liang, Linjing Li et al.

Hierarchical reinforcement learning composites subpolicies in different hierarchies to accomplish complex tasks.Automated subpolicies discovery, which does not depend on domain knowledge, is a promising approach to generating subpolicies.However, the degradation problem is a challenge that existing methods can hardly deal with due to the lack of consideration of diversity or the employment of weak regularizers. In this paper, we propose a novel task-agnostic regularizer called the Wasserstein Diversity-Enriched Regularizer (WDER), which enlarges the diversity of subpolicies by maximizing the Wasserstein distances among action distributions. The proposed WDER can be easily incorporated into the loss function of existing methods to boost their performance further.Experimental results demonstrate that our WDER improves performance and sample efficiency in comparison with prior work without modifying hyperparameters, which indicates the applicability and robustness of the WDER.

30.8ROMay 13
Object Manipulation of the Variable Topology Truss system

Andrew Jang-Ho Bae, Myeongjin Choi, Haorui Li et al.

This paper presents an object manipulation strategy for the Variable Topology Truss (VTT) system, a truss robot that comprises actuated truss members connected by passive spherical joints. Although truss robots were originally proposed as rapidly deployable manipulators, manipulation strategy has not been studied thoroughly. To enable manipulation, we introduce a hybrid control framework that regulates position and force concurrently without explicit decoupling. At the actuator level, each member employs a sensor-based force feedback controller to generate the desired axial forces despite high actuator friction. At the task level, the forces applied at the end-effector nodes are produced by computing the required member forces using a static model of the VTT. We evaluate force-tracking performance through experiments on both a single member module and the full VTT system. Finally, we demonstrate object manipulation using two representative configurations and quantitatively assess combined position and force tracking performance. Experimental results confirm that the proposed approach enables consistent and reliable object manipulation with the VTT system.

LGOct 31, 2025
InertialAR: Autoregressive 3D Molecule Generation with Inertial Frames

Haorui Li, Weitao Du, Yuqiang Li et al.

Transformer-based autoregressive models have emerged as a unifying paradigm across modalities such as text and images, but their extension to 3D molecule generation remains underexplored. The gap stems from two fundamental challenges: (1) tokenizing molecules into a canonical 1D sequence of tokens that is invariant to both SE(3) transformations and atom index permutations, and (2) designing an architecture capable of modeling hybrid atom-based tokens that couple discrete atom types with continuous 3D coordinates. To address these challenges, we introduce InertialAR. InertialAR devises a canonical tokenization that aligns molecules to their inertial frames and reorders atoms to ensure SE(3) and permutation invariance. Moreover, InertialAR equips the attention mechanism with geometric awareness via geometric rotary positional encoding (GeoRoPE). In addition, it utilizes a hierarchical autoregressive paradigm to predict the next atom-based token, predicting the atom type first and then its 3D coordinates via Diffusion loss. Experimentally, InertialAR achieves state-of-the-art performance on 7 of the 10 evaluation metrics for unconditional molecule generation across QM9, GEOM-Drugs, and B3LYP. Moreover, it significantly outperforms strong baselines in controllable generation for targeted chemical functionality, attaining state-of-the-art results across all 5 metrics.

69.0NIMar 12
Efficient Interference Graph Estimation via Concurrent Flooding

Haifeng Jia, Yichen Wei, Zhan Wang et al.

Traditional wisdom for network management allocates network resources separately for the measurement and data transmission tasks. Heavy measurement tasks may take up resources for data transmission and significantly reduce network performance. It is therefore challenging for interference graphs, deemed as incurring heavy measurement overhead, to be used in practice in wireless networks. To address this challenge in wireless sensor networks, we propose to use power as a new dimension for interference graph estimation (IGE) and integrate IGE with concurrent flooding such that IGE can be done simultaneously with flooding using the same frequency-time resources. With controlled and real-world experiments, we show that it is feasible to efficiently achieve IGE via concurrent flooding on the commercial off-the-shelf (COTS) devices by controlling the transmit powers of nodes. We believe that efficient IGE would be a key enabler for the practical use of the existing scheduling algorithms assuming known interference graphs.

80.6PFMay 4
When Is the Same Model Not the Same Service? A Measurement Study of Hosted Open-Weight LLM APIs

Haorui Li, Zhenghui He, Xuanzi Liu et al.

Open-weight large language models (LLMs) are often described as downloadable model artifacts, but in production they are increasingly consumed as hosted APIs. This paper studies the intermediary service layer that turns a model release into an operational endpoint. Using sampled request logs, provider metadata, compatibility probes, pricing snapshots, and continuous latency measurements collected by AI Ping during Q4 2025, we analyze demand concentration, provider heterogeneity, and task-conditioned routing for popular open-weight model families. The first empirical pattern is concentration with inertia: among the model families displayed in the public aggregate, the largest family carries 32.0% of relative demand and the top five carry 87.4%, with a Gini coefficient of 0.693, yet older versions remain active after newer releases. The second pattern is a separation between supply and use: broad provider listing of a model does not imply realized adoption, and listed prices are more anchored than latency, throughput, context length, protocol support, and error semantics. The third pattern is conditionality: applications induce different token-length regimes, so the relevant service object is not a model name but a provider-model-task-time tuple under protocol and context constraints. In two representative counterfactuals, routing lowers Qwen3-32B cost by 37.8% and raises DeepSeek-V3.2 average throughput by about 90% relative to direct official access. These results suggest that open-weight LLM deployment should be studied as a constrained statistical decision problem over a heterogeneous service layer, rather than as a static catalog of model capabilities.

CVJan 30, 2024
BoostDream: Efficient Refining for High-Quality Text-to-3D Generation from Multi-View Diffusion

Yonghao Yu, Shunan Zhu, Huai Qin et al.

Witnessing the evolution of text-to-image diffusion models, significant strides have been made in text-to-3D generation. Currently, two primary paradigms dominate the field of text-to-3D: the feed-forward generation solutions, capable of swiftly producing 3D assets but often yielding coarse results, and the Score Distillation Sampling (SDS) based solutions, known for generating high-fidelity 3D assets albeit at a slower pace. The synergistic integration of these methods holds substantial promise for advancing 3D generation techniques. In this paper, we present BoostDream, a highly efficient plug-and-play 3D refining method designed to transform coarse 3D assets into high-quality. The BoostDream framework comprises three distinct processes: (1) We introduce 3D model distillation that fits differentiable representations from the 3D assets obtained through feed-forward generation. (2) A novel multi-view SDS loss is designed, which utilizes a multi-view aware 2D diffusion model to refine the 3D assets. (3) We propose to use prompt and multi-view consistent normal maps as guidance in refinement.Our extensive experiment is conducted on different differentiable 3D representations, revealing that BoostDream excels in generating high-quality 3D assets rapidly, overcoming the Janus problem compared to conventional SDS-based methods. This breakthrough signifies a substantial advancement in both the efficiency and quality of 3D generation processes.