Kuan Xu

RO
h-index14
18papers
754citations
Novelty54%
AI Score58

18 Papers

AIFeb 18, 2023
Knowledge Graph Completion based on Tensor Decomposition for Disease Gene Prediction

Xinyan Wang, Ting Jia, Chongyu Wang et al. · tsinghua

Accurate identification of disease genes has consistently been one of the keys to decoding a disease's molecular mechanism. Most current approaches focus on constructing biological networks and utilizing machine learning, especially, deep learning to identify disease genes, but ignore the complex relations between entities in the biological knowledge graph. In this paper, we construct a biological knowledge graph centered on diseases and genes, and develop an end-to-end Knowledge graph completion model for Disease Gene Prediction using interactional tensor decomposition (called KDGene). KDGene introduces an interaction module between the embeddings of entities and relations to tensor decomposition, which can effectively enhance the information interaction in biological knowledge. Experimental results show that KDGene significantly outperforms state-of-the-art algorithms. Furthermore, the comprehensive biological analysis of the case of diabetes mellitus confirms KDGene's ability for identifying new and accurate candidate genes. This work proposes a scalable knowledge graph completion framework to identify disease candidate genes, from which the results are promising to provide valuable references for further wet experiments.

NAApr 23, 2018
Spectral approximation of convolution operator

Kuan Xu, Ana Loureiro

We develop a unified framework for constructing matrix approximations to the convolution operator of Volterra type defined by functions that are approximated using classical orthogonal polynomials on $[-1, 1]$. The numerically stable algorithms we propose exploit recurrence relations and symmetric properties satisfied by the entries of these convolution matrices. Laguerre-based convolution matrices that approximate Volterra convolution operator defined by functions on $[0, \infty]$ are also discussed for the sake of completeness.

AIFeb 6, 2023
A Pre-training Framework for Knowledge Graph Completion

Kuan Xu, Kuo Yang, Hanyang Dong et al. · tsinghua

Knowledge graph completion (KGC) is one of the effective methods to identify new facts in knowledge graph. Except for a few methods based on graph network, most of KGC methods trend to be trained based on independent triples, while are difficult to take a full account of the information of global network connection contained in knowledge network. To address these issues, in this study, we propose a simple and effective Network-based Pre-training framework for knowledge graph completion (termed NetPeace), which takes into account the information of global network connection and local triple relationships in knowledge graph. Experiments show that in NetPeace framework, multiple KGC models yields consistent and significant improvements on benchmarks (e.g., 36.45% Hits@1 and 27.40% MRR improvements for TuckER on FB15k-237), especially dense knowledge graph. On the challenging low-resource task, NetPeace that benefits from the global features of KG achieves higher performance (104.03% MRR and 143.89% Hit@1 improvements at most) than original models.

CAApr 26, 2018
Volterra-type convolution of classical polynomials

Ana F. Loureiro, Kuan Xu

We present a general framework for calculating the Volterra-type convolution of polynomials from an arbitrary polynomial sequence $\{P_k(x)\}_{k \geqslant 0}$ with $°P_k(x) = k$. Based on this framework, series representations for the convolutions of classical orthogonal polynomials, including Jacobi and Laguerre families, are derived, along with some relevant results pertaining to these new formulas.

78.5NAApr 28
Fractional calculus via variable-transform-based spectral approximations

Xiaolin Liu, Kuan Xu

We present a novel and unifying framework for constructing spectral approximations to fractional integral operators. These spectral approximations are based on transplanted Chebyshev polynomials, which are obtained by composing Chebyshev polynomials with a variable transform. When an algebraic transform is used, the framework produces spectral approximations based on Jacobi fractional polynomials. When an exponential transform is used, it yields a versatile spectral approximation that is applicable to a much broader class of fractional calculus problems. The construction of such spectral approximations is both numerically stable and optimal in terms of complexity. These spectral approximations lead to stable and fast spectral methods for fractional calculus. The spectral approximation based on the double-exponential transform is demonstrated through extensive numerical examples that are intractable for existing spectral methods.

LGOct 15, 2023
Enhancing Column Generation by Reinforcement Learning-Based Hyper-Heuristic for Vehicle Routing and Scheduling Problems

Kuan Xu, Li Shen, Lindong Liu

Column generation (CG) is a vital method to solve large-scale problems by dynamically generating variables. It has extensive applications in common combinatorial optimization, such as vehicle routing and scheduling problems, where each iteration step requires solving an NP-hard constrained shortest path problem. Although some heuristic methods for acceleration already exist, they are not versatile enough to solve different problems. In this work, we propose a reinforcement learning-based hyper-heuristic framework, dubbed RLHH, to enhance the performance of CG. RLHH is a selection module embedded in CG to accelerate convergence and get better integer solutions. In each CG iteration, the RL agent selects a low-level heuristic to construct a reduced network only containing the edges with a greater chance of being part of the optimal solution. In addition, we specify RLHH to solve two typical combinatorial optimization problems: Vehicle Routing Problem with Time Windows (VRPTW) and Bus Driver Scheduling Problem (BDSP). The total cost can be reduced by up to 27.9\% in VRPTW and 15.4\% in BDSP compared to the best lower-level heuristic in our tested scenarios, within equivalent or even less computational time. The proposed RLHH is the first RL-based CG method that outperforms traditional approaches in terms of solution quality, which can promote the application of CG in combinatorial optimization.

LGMar 7, 2025Code
Every FLOP Counts: Scaling a 300B Mixture-of-Experts LING LLM without Premium GPUs

Ling Team, Binwei Zeng, Chao Huang et al.

In this technical report, we tackle the challenges of training large-scale Mixture of Experts (MoE) models, focusing on overcoming cost inefficiency and resource limitations prevalent in such systems. To address these issues, we present two differently sized MoE large language models (LLMs), namely Ling-Lite and Ling-Plus (referred to as "Bailing" in Chinese, spelled Bǎilíng in Pinyin). Ling-Lite contains 16.8 billion parameters with 2.75 billion activated parameters, while Ling-Plus boasts 290 billion parameters with 28.8 billion activated parameters. Both models exhibit comparable performance to leading industry benchmarks. This report offers actionable insights to improve the efficiency and accessibility of AI development in resource-constrained settings, promoting more scalable and sustainable technologies. Specifically, to reduce training costs for large-scale MoE models, we propose innovative methods for (1) optimization of model architecture and training processes, (2) refinement of training anomaly handling, and (3) enhancement of model evaluation efficiency. Additionally, leveraging high-quality data generated from knowledge graphs, our models demonstrate superior capabilities in tool use compared to other models. Ultimately, our experimental findings demonstrate that a 300B MoE LLM can be effectively trained on lower-performance devices while achieving comparable performance to models of a similar scale, including dense and MoE models. Compared to high-performance devices, utilizing a lower-specification hardware system during the pre-training phase demonstrates significant cost savings, reducing computing costs by approximately 20%. The models can be accessed at https://huggingface.co/inclusionAI.

74.2ROApr 23
A Deployable Embodied Vision-Language Navigation System with Hierarchical Cognition and Context-Aware Exploration

Kuan Xu, Ruimeng Liu, Yizhuo Yang et al.

Bridging the gap between embodied intelligence and embedded deployment remains a key challenge in intelligent robotic systems, where perception, reasoning, and planning must operate under strict constraints on computation, memory, energy, and real-time execution. In vision-language navigation (VLN), existing approaches often face a fundamental trade-off between strong reasoning capabilities and efficient deployment on real-world platforms. In this paper, we present a deployable embodied VLN system that achieves both high efficiency and robust high-level reasoning on real-world robotic platforms. To achieve this, we decouple the system into three asynchronous modules: a real-time perception module for continuous environment sensing, a memory integration module for spatial-semantic aggregation, and a reasoning module for high-level decision making. We incrementally construct a cognitive memory graph to encode scene information, which is further decomposed into subgraphs to enable reasoning with a vision-language model (VLM). To further improve navigation efficiency and accuracy, we also leverage the cognitive memory graph to formulate the exploration problem as a context-aware Weighted Traveling Repairman Problem (WTRP), which minimizes the weighted waiting time of viewpoints. Extensive experiments in both simulation and real-world robotic platforms demonstrate improved navigation success and efficiency over existing VLN approaches, while maintaining real-time performance on resource-constrained hardware.

CLOct 21, 2025Code
Every Step Evolves: Scaling Reinforcement Learning for Trillion-Scale Thinking Model

Ling Team, Anqi Shen, Baihui Li et al.

We present Ring-1T, the first open-source, state-of-the-art thinking model with a trillion-scale parameter. It features 1 trillion total parameters and activates approximately 50 billion per token. Training such models at a trillion-parameter scale introduces unprecedented challenges, including train-inference misalignment, inefficiencies in rollout processing, and bottlenecks in the RL system. To address these, we pioneer three interconnected innovations: (1) IcePop stabilizes RL training via token-level discrepancy masking and clipping, resolving instability from training-inference mismatches; (2) C3PO++ improves resource utilization for long rollouts under a token budget by dynamically partitioning them, thereby obtaining high time efficiency; and (3) ASystem, a high-performance RL framework designed to overcome the systemic bottlenecks that impede trillion-parameter model training. Ring-1T delivers breakthrough results across critical benchmarks: 93.4 on AIME-2025, 86.72 on HMMT-2025, 2088 on CodeForces, and 55.94 on ARC-AGI-1. Notably, it attains a silver medal-level result on the IMO-2025, underscoring its exceptional reasoning capabilities. By releasing the complete 1T parameter MoE model to the community, we provide the research community with direct access to cutting-edge reasoning capabilities. This contribution marks a significant milestone in democratizing large-scale reasoning intelligence and establishes a new baseline for open-source model performance.

CVNov 30, 2021Code
AirObject: A Temporally Evolving Graph Embedding for Object Identification

Nikhil Varma Keetha, Chen Wang, Yuheng Qiu et al.

Object encoding and identification are vital for robotic tasks such as autonomous exploration, semantic scene understanding, and re-localization. Previous approaches have attempted to either track objects or generate descriptors for object identification. However, such systems are limited to a "fixed" partial object representation from a single viewpoint. In a robot exploration setup, there is a requirement for a temporally "evolving" global object representation built as the robot observes the object from multiple viewpoints. Furthermore, given the vast distribution of unknown novel objects in the real world, the object identification process must be class-agnostic. In this context, we propose a novel temporal 3D object encoding approach, dubbed AirObject, to obtain global keypoint graph-based embeddings of objects. Specifically, the global 3D object embeddings are generated using a temporal convolutional network across structural information of multiple frames obtained from a graph attention-based encoding method. We demonstrate that AirObject achieves the state-of-the-art performance for video object identification and is robust to severe occlusion, perceptual aliasing, viewpoint shift, deformation, and scale transform, outperforming the state-of-the-art single-frame and sequential descriptors. To the best of our knowledge, AirObject is one of the first temporal object encoding methods. Source code is available at https://github.com/Nik-V9/AirObject.

ROOct 16, 2017Code
GroundSLAM: A Robust Visual SLAM System for Warehouse Robots Using Ground Textures

Kuan Xu, Zheng Yang, Lihua Xie et al.

A robust visual localization and mapping system is essential for warehouse robot navigation, as cameras offer a more cost-effective alternative to LiDAR sensors. However, existing forward-facing camera systems often encounter challenges in dynamic environments and open spaces, leading to significant performance degradation during deployment. To address these limitations, a localization system utilizing a single downward-facing camera to capture ground textures presents a promising solution. Nevertheless, existing feature-based ground-texture localization methods face difficulties when operating on surfaces with sparse features or repetitive patterns. To address this limitation, we propose GroundSLAM, a novel feature-free and ground-texture-based simultaneous localization and mapping (SLAM) system. GroundSLAM consists of three components: feature-free visual odometry, ground-texture-based loop detection and map optimization, and map reuse. Specifically, we introduce a kernel cross-correlator (KCC) for image-level pose tracking, loop detection, and map reuse to improve localization accuracy and robustness, and incorporate adaptive pruning strategies to enhance efficiency. Due to these specific designs, GroundSLAM is able to deliver efficient and stable localization across various ground surfaces such as those with sparse features and repetitive patterns. To advance research in this area, we introduce the first ground-texture dataset with precise ground-truth poses, consisting of 131k images collected from 10 kinds of indoor and outdoor ground surfaces. Extensive experimental results show that GroundSLAM outperforms state-of-the-art methods for both indoor and outdoor localization. We release our code and dataset at https://github.com/sair-lab/GroundSLAM.

ROFeb 24
LST-SLAM: A Stereo Thermal SLAM System for Kilometer-Scale Dynamic Environments

Zeyu Jiang, Kuan Xu, Changhao Chen

Thermal cameras offer strong potential for robot perception under challenging illumination and weather conditions. However, thermal Simultaneous Localization and Mapping (SLAM) remains difficult due to unreliable feature extraction, unstable motion tracking, and inconsistent global pose and map construction, particularly in dynamic large-scale outdoor environments. To address these challenges, we propose LST-SLAM, a novel large-scale stereo thermal SLAM system that achieves robust performance in complex, dynamic scenes. Our approach combines self-supervised thermal feature learning, stereo dual-level motion tracking, and geometric pose optimization. We also introduce a semantic-geometric hybrid constraint that suppresses potentially dynamic features lacking strong inter-frame geometric consistency. Furthermore, we develop an online incremental bag-of-words model for loop closure detection, coupled with global pose optimization to mitigate accumulated drift. Extensive experiments on kilometer-scale dynamic thermal datasets show that LST-SLAM significantly outperforms recent representative SLAM systems, including AirSLAM and DROID-SLAM, in both robustness and accuracy.

CLJun 17, 2025
Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs

Ling Team, Bin Hu, Cai Chen et al.

We present Ring-lite, a Mixture-of-Experts (MoE)-based large language model optimized via reinforcement learning (RL) to achieve efficient and robust reasoning capabilities. Built upon the publicly available Ling-lite model, a 16.8 billion parameter model with 2.75 billion activated parameters, our approach matches the performance of state-of-the-art (SOTA) small-scale reasoning models on challenging benchmarks (e.g., AIME, LiveCodeBench, GPQA-Diamond) while activating only one-third of the parameters required by comparable models. To accomplish this, we introduce a joint training pipeline integrating distillation with RL, revealing undocumented challenges in MoE RL training. First, we identify optimization instability during RL training, and we propose Constrained Contextual Computation Policy Optimization(C3PO), a novel approach that enhances training stability and improves computational throughput via algorithm-system co-design methodology. Second, we empirically demonstrate that selecting distillation checkpoints based on entropy loss for RL training, rather than validation metrics, yields superior performance-efficiency trade-offs in subsequent RL training. Finally, we develop a two-stage training paradigm to harmonize multi-domain data integration, addressing domain conflicts that arise in training with mixed dataset. We will release the model, dataset, and code.

RODec 9, 2024
Enhancing Scene Coordinate Regression with Efficient Keypoint Detection and Sequential Information

Kuan Xu, Zeyu Jiang, Haozhi Cao et al.

Scene Coordinate Regression (SCR) is a visual localization technique that utilizes deep neural networks (DNN) to directly regress 2D-3D correspondences for camera pose estimation. However, current SCR methods often face challenges in handling repetitive textures and meaningless areas due to their reliance on implicit triangulation. In this paper, we propose an efficient and accurate SCR system. Compared to existing SCR methods, we propose a unified architecture for both scene encoding and salient keypoint detection, allowing our system to prioritize the encoding of informative regions. This design significantly improves computational efficiency. Additionally, we introduce a mechanism that utilizes sequential information during both mapping and relocalization. The proposed method enhances the implicit triangulation, especially in environments with repetitive textures. Comprehensive experiments conducted across indoor and outdoor datasets demonstrate that the proposed system outperforms state-of-the-art (SOTA) SCR methods. Our single-frame relocalization mode improves the recall rate of our baseline by 6.4% and increases the running speed from 56Hz to 90Hz. Furthermore, our sequence-based mode increases the recall rate by 11% while maintaining the original efficiency.

69.2ROMar 13
Beyond Imitation: Reinforcement Learning Fine-Tuning for Adaptive Diffusion Navigation Policies

Junhe Sheng, Ruofei Bai, Kuan Xu et al.

Diffusion-based robot navigation policies trained on large-scale imitation learning datasets, can generate multi-modal trajectories directly from the robot's visual observations, bypassing the traditional localization-mapping-planning pipeline and achieving strong zero-shot generalization. However, their performance remains constrained by the coverage of offline datasets, and when deployed in unseen settings, distribution shift often leads to accumulated trajectory errors and safety-critical failures. Adapting diffusion policies with reinforcement learning is challenging because their iterative denoising structure hinders effective gradient backpropagation, while also making the training of an additional value network computationally expensive and less stable. To address these issues, we propose a reinforcement learning fine-tuning framework tailored for diffusion-based navigation. The method leverages the inherent multi-trajectory sampling mechanism of diffusion models and adopts Group Relative Policy Optimization (GRPO), which estimates relative advantages across sampled trajectories without requiring a separate value network. To preserve pretrained representations while enabling adaptation, we freeze the visual encoder and selectively update the higher decoder layers and action head, enhancing safety-aware behaviors through online environmental feedback. On the PointGoal task in Isaac Sim, our approach improves the Success Rate from 52.0% to 58.7% and SPL from 0.49 to 0.54 on unseen scenes, while reducing collision frequency. Additional experiments show that the fine-tuned policy transfers zero-shot to a real quadruped platform and maintains stable performance in geometrically out-of-distribution environments, suggesting improved adaptability and safe generalization to new domains.

LGMay 29, 2025
Learning to Search for Vehicle Routing with Multiple Time Windows

Kuan Xu, Zhiguang Cao, Chenlong Zheng et al.

In this study, we propose a reinforcement learning-based adaptive variable neighborhood search (RL-AVNS) method designed for effectively solving the Vehicle Routing Problem with Multiple Time Windows (VRPMTW). Unlike traditional adaptive approaches that rely solely on historical operator performance, our method integrates a reinforcement learning framework to dynamically select neighborhood operators based on real-time solution states and learned experience. We introduce a fitness metric that quantifies customers' temporal flexibility to improve the shaking phase, and employ a transformer-based neural policy network to intelligently guide operator selection during the local search. Extensive computational experiments are conducted on realistic scenarios derived from the replenishment of unmanned vending machines, characterized by multiple clustered replenishment windows. Results demonstrate that RL-AVNS significantly outperforms traditional variable neighborhood search (VNS), adaptive VNS (AVNS), and state-of-the-art learning-based heuristics, achieving substantial improvements in solution quality and computational efficiency across various instance scales and time window complexities. Particularly notable is the algorithm's capability to generalize effectively to problem instances not encountered during training, underscoring its practical utility for complex logistics scenarios.

CLMay 17, 2021
SeaD: End-to-end Text-to-SQL Generation with Schema-aware Denoising

Kuan Xu, Yongbo Wang, Yongliang Wang et al.

In text-to-SQL task, seq-to-seq models often lead to sub-optimal performance due to limitations in their architecture. In this paper, we present a simple yet effective approach that adapts transformer-based seq-to-seq model to robust text-to-SQL generation. Instead of inducing constraint to decoder or reformat the task as slot-filling, we propose to train seq-to-seq model with Schema aware Denoising (SeaD), which consists of two denoising objectives that train model to either recover input or predict output from two novel erosion and shuffle noises. These denoising objectives acts as the auxiliary tasks for better modeling the structural data in S2S generation. In addition, we improve and propose a clause-sensitive execution guided (EG) decoding strategy to overcome the limitation of EG decoding for generative model. The experiments show that the proposed method improves the performance of seq-to-seq model in both schema linking and grammar correctness and establishes new state-of-the-art on WikiSQL benchmark. The results indicate that the capacity of vanilla seq-to-seq architecture for text-to-SQL may have been under-estimated.

ROMay 1, 2021
AirCode: A Robust Object Encoding Method

Kuan Xu, Chen Wang, Chao Chen et al.

Object encoding and identification are crucial for many robotic tasks such as autonomous exploration and semantic relocalization. Existing works heavily rely on the tracking of detected objects but have difficulty recalling revisited objects precisely. In this paper, we propose a novel object encoding method, which is named as AirCode, based on a graph of key-points. To be robust to the number of key-points detected, we propose a feature sparse encoding and object dense encoding method to ensure that each key-point can only affect a small part of the object descriptors, leading it to be robust to viewpoint changes, scaling, occlusion, and even object deformation. In the experiments, we show that it achieves superior performance for object identification than the state-of-the-art algorithms and is able to provide reliable semantic relocalization. It is a plug-and-play module and we expect that it will play an important role in various applications.