Kailai Li

RO
h-index22
6papers
275citations
Novelty52%
AI Score35

6 Papers

ROOct 11, 2023
Co-NavGPT: Multi-Robot Cooperative Visual Semantic Navigation Using Vision Language Models

Bangguo Yu, Qihao Yuan, Kailai Li et al.

Visual target navigation is a critical capability for autonomous robots operating in unknown environments, particularly in human-robot interaction scenarios. While classical and learning-based methods have shown promise, most existing approaches lack common-sense reasoning and are typically designed for single-robot settings, leading to reduced efficiency and robustness in complex environments. To address these limitations, we introduce Co-NavGPT, a novel framework that integrates a Vision Language Model (VLM) as a global planner to enable common-sense multi-robot visual target navigation. Co-NavGPT aggregates sub-maps from multiple robots with diverse viewpoints into a unified global map, encoding robot states and frontier regions. The VLM uses this information to assign frontiers across the robots, facilitating coordinated and efficient exploration. Experiments on the Habitat-Matterport 3D (HM3D) demonstrate that Co-NavGPT outperforms existing baselines in terms of success rate and navigation efficiency, without requiring task-specific training. Ablation studies further confirm the importance of semantic priors from the VLM. We also validate the framework in real-world scenarios using quadrupedal robots. Supplementary video and code are available at: https://sites.google.com/view/co-navgpt2.

LGMar 13, 2023
Gaussian Process on the Product of Directional Manifolds

Ziyu Cao, Kailai Li

We present a principled study on defining Gaussian processes (GPs) with inputs on the product of directional manifolds. A circular kernel is first presented according to the von Mises distribution. Based thereon, the hypertoroidal von Mises (HvM) kernel is proposed to establish GPs on hypertori with consideration of correlated circular components. The proposed HvM kernel is demonstrated with multi-output GP regression for learning vector-valued functions on hypertori using the intrinsic coregionalization model. Analytic derivatives for hyperparameter optimization are provided for runtime-critical applications. For evaluation, we synthesize a ranging-based sensor network and employ the HvM-based GPs for data-driven recursive localization. Numerical results show that the HvM-based GP achieves superior tracking accuracy compared to parametric model and GPs of conventional kernel designs.

CVNov 21, 2024Code
Solving Zero-Shot 3D Visual Grounding as Constraint Satisfaction Problems

Qihao Yuan, Kailai Li, Jiaming Zhang

3D visual grounding (3DVG) aims to locate objects in a 3D scene with natural language descriptions. Supervised methods have achieved decent accuracy, but have a closed vocabulary and limited language understanding ability. Zero-shot methods utilize large language models (LLMs) to handle natural language descriptions, where the LLM either produces grounding results directly or generates programs that compute results (symbolically). In this work, we propose a zero-shot method that reformulates the 3DVG task as a Constraint Satisfaction Problem (CSP), where the variables and constraints represent objects and their spatial relations, respectively. This allows a global symbolic reasoning of all relevant objects, producing grounding results of both the target and anchor objects. Moreover, we demonstrate the flexibility of our framework by handling negation- and counting-based queries with only minor extra coding efforts. Our system, Constraint Satisfaction Visual Grounding (CSVG), has been extensively evaluated on the public datasets ScanRefer and Nr3D datasets using only open-source LLMs. Results show the effectiveness of CSVG and superior grounding accuracy over current state-of-the-art zero-shot 3DVG methods with improvements of $+7.0\%$ (Acc@0.5 score) and $+11.2\%$ on the ScanRefer and Nr3D datasets, respectively. The code of our system is available at https://asig-x.github.io/csvg_web.

ROOct 25, 2020Code
Towards High-Performance Solid-State-LiDAR-Inertial Odometry and Mapping

Kailai Li, Meng Li, Uwe D. Hanebeck

We present a novel tightly-coupled LiDAR-inertial odometry and mapping scheme for both solid-state and mechanical LiDARs. As frontend, a feature-based lightweight LiDAR odometry provides fast motion estimates for adaptive keyframe selection. As backend, a hierarchical keyframe-based sliding window optimization is performed through marginalization for directly fusing IMU and LiDAR measurements. For the Livox Horizon, a newly released solid-state LiDAR, a novel feature extraction method is proposed to handle its irregular scan pattern during preprocessing. LiLi-OM (Livox LiDAR-inertial odometry and mapping) is real-time capable and achieves superior accuracy over state-of-the-art systems for both LiDAR types on public data sets of mechanical LiDARs and in experiments using the Livox Horizon. Source code and recorded experimental data sets are available at https://github.com/KIT-ISAS/lili-om.

ROFeb 5, 2025
IRIS: An Immersive Robot Interaction System

Xinkai Jiang, Qihao Yuan, Enes Ulas Dincer et al.

This paper introduces IRIS, an Immersive Robot Interaction System leveraging Extended Reality (XR). Existing XR-based systems enable efficient data collection but are often challenging to reproduce and reuse due to their specificity to particular robots, objects, simulators, and environments. IRIS addresses these issues by supporting immersive interaction and data collection across diverse simulators and real-world scenarios. It visualizes arbitrary rigid and deformable objects, robots from simulation, and integrates real-time sensor-generated point clouds for real-world applications. Additionally, IRIS enhances collaborative capabilities by enabling multiple users to simultaneously interact within the same virtual scene. Extensive experiments demonstrate that IRIS offers efficient and intuitive data collection in both simulated and real-world settings.

ROJul 31, 2019
Improved Pose Graph Optimization for Planar Motions Using Riemannian Geometry on the Manifold of Dual Quaternions

Kailai Li, Johannes Cox, Benjamin Noack et al.

We present a novel Riemannian approach for planar pose graph optimization problems. By formulating the cost function based on the Riemannian metric on the manifold of dual quaternions representing planar motions, the nonlinear structure of the SE(2) group is inherently considered. To solve the on-manifold least squares problem, a Riemannian Gauss-Newton method using the exponential retraction is applied. The proposed Riemannian pose graph optimizer (RPG-Opt) is further evaluated based on public planar pose graph data sets. Compared with state-of-the-art frameworks, the proposed method gives equivalent accuracy and better convergence robustness under large uncertainties of odometry measurements.