Shoudong Huang

h-index9

21papers

645citations

Novelty54%

AI Score50

Ranked #44,342 of 201,326 authors (top 22%)#1,229 in RO (top 16%)

21 Papers

ROSep 11, 2023Code

CARE: Confidence-rich Autonomous Robot Exploration using Bayesian Kernel Inference and Optimization

Yang Xu, Ronghao Zheng, Senlin Zhang et al.

In this paper, we consider improving the efficiency of information-based autonomous robot exploration in unknown and complex environments. We first utilize Gaussian process (GP) regression to learn a surrogate model to infer the confidence-rich mutual information (CRMI) of querying control actions, then adopt an objective function consisting of predicted CRMI values and prediction uncertainties to conduct Bayesian optimization (BO), i.e., GP-based BO (GPBO). The trade-off between the best action with the highest CRMI value (exploitation) and the action with high prediction variance (exploration) can be realized. To further improve the efficiency of GPBO, we propose a novel lightweight information gain inference method based on Bayesian kernel inference and optimization (BKIO), achieving an approximate logarithmic complexity without the need for training. BKIO can also infer the CRMI and generate the best action using BO with bounded cumulative regret, which ensures its comparable accuracy to GPBO with much higher efficiency. Extensive numerical and real-world experiments show the desired efficiency of our proposed methods without losing exploration performance in different unstructured, cluttered environments. We also provide our open-source implementation code at https://github.com/Shepherd-Gregory/BKIO-Exploration.

ROApr 16

POMDP-based Object Search with Growing State Space and Hybrid Action Domain

Yongbo Chen, Hesheng Wang, Shoudong Huang et al.

Efficiently locating target objects in complex indoor environments with diverse furniture, such as shelves, tables, and beds, is a significant challenge for mobile robots. This difficulty arises from factors like localization errors, limited fields of view, and visual occlusion. We address this by framing the object-search task as a highdimensional Partially Observable Markov Decision Process (POMDP) with a growing state space and hybrid (continuous and discrete) action spaces in 3D environments. Based on a meticulously designed perception module, a novel online POMDP solver named the growing neural process filtered k-center clustering tree (GNPF-kCT) is proposed to tackle this problem. Optimal actions are selected using Monte Carlo Tree Search (MCTS) with belief tree reuse for growing state space, a neural process network to filter useless primitive actions, and k-center clustering hypersphere discretization for efficient refinement of high-dimensional action spaces. A modified upper-confidence bound (UCB), informed by belief differences and action value functions within cells of estimated diameters, guides MCTS expansion. Theoretical analysis validates the convergence and performance potential of our method. To address scenarios with limited information or rewards, we also introduce a guessed target object with a grid-world model as a key strategy to enhance search efficiency. Extensive Gazebo simulations with Fetch and Stretch robots demonstrate faster and more reliable target localization than POMDP-based baselines and state-of-the-art (SOTA) non-POMDP-based solvers, especially large language model (LLM) based methods, in object search under the same computational constraints and perception systems. Real-world tests in office environments confirm the practical applicability of our approach. Project page: https://sites.google.com/view/gnpfkct.

CVOct 2, 2025

Non-Rigid Structure-from-Motion via Differential Geometry with Recoverable Conformal Scale

Yongbo Chen, Yanhao Zhang, Shaifali Parashar et al.

Non-rigid structure-from-motion (NRSfM), a promising technique for addressing the mapping challenges in monocular visual deformable simultaneous localization and mapping (SLAM), has attracted growing attention. We introduce a novel method, called Con-NRSfM, for NRSfM under conformal deformations, encompassing isometric deformations as a subset. Our approach performs point-wise reconstruction using 2D selected image warps optimized through a graph-based framework. Unlike existing methods that rely on strict assumptions, such as locally planar surfaces or locally linear deformations, and fail to recover the conformal scale, our method eliminates these constraints and accurately computes the local conformal scale. Additionally, our framework decouples constraints on depth and conformal scale, which are inseparable in other approaches, enabling more precise depth estimation. To address the sensitivity of the formulated problem, we employ a parallel separable iterative optimization strategy. Furthermore, a self-supervised learning framework, utilizing an encoder-decoder network, is incorporated to generate dense 3D point clouds with texture. Simulation and experimental results using both synthetic and real datasets demonstrate that our method surpasses existing approaches in terms of reconstruction accuracy and robustness. The code for the proposed method will be made publicly available on the project website: https://sites.google.com/view/con-nrsfm.

CVJun 18, 2025

Correspondence-Free Multiview Point Cloud Registration via Depth-Guided Joint Optimisation

Yiran Zhou, Yingyu Wang, Shoudong Huang et al.

Multiview point cloud registration is a fundamental task for constructing globally consistent 3D models. Existing approaches typically rely on feature extraction and data association across multiple point clouds; however, these processes are challenging to obtain global optimal solution in complex environments. In this paper, we introduce a novel correspondence-free multiview point cloud registration method. Specifically, we represent the global map as a depth map and leverage raw depth information to formulate a non-linear least squares optimisation that jointly estimates poses of point clouds and the global map. Unlike traditional feature-based bundle adjustment methods, which rely on explicit feature extraction and data association, our method bypasses these challenges by associating multi-frame point clouds with a global depth map through their corresponding poses. This data association is implicitly incorporated and dynamically refined during the optimisation process. Extensive evaluations on real-world datasets demonstrate that our method outperforms state-of-the-art approaches in accuracy, particularly in challenging environments where feature extraction and data association are difficult.

ROSep 11, 2021

A Right Invariant Extended Kalman Filter for Object based SLAM

Yang Song, Zhuqing Zhang, Jun Wu et al.

With the recent advance of deep learning based object recognition and estimation, it is possible to consider object level SLAM where the pose of each object is estimated in the SLAM process. In this paper, based on a novel Lie group structure, a right invariant extended Kalman filter (RI-EKF) for object based SLAM is proposed. The observability analysis shows that the proposed algorithm automatically maintains the correct unobservable subspace, while standard EKF (Std-EKF) based SLAM algorithm does not. This results in a better consistency for the proposed algorithm comparing to Std-EKF. Finally, simulations and real world experiments validate not only the consistency and accuracy of the proposed algorithm, but also the practicability of the proposed RI-EKF for object based SLAM problem. The MATLAB code of the algorithm is made publicly available.

CVMay 21, 2021

DSR: Direct Simultaneous Registration for Multiple 3D Images

Zhehua Mao, Liang Zhao, Shoudong Huang et al.

This paper presents a novel algorithm named Direct Simultaneous Registration (DSR) that registers a collection of 3D images in a simultaneous fashion without specifying any reference image, feature extraction and matching, or information loss or reuse. The algorithm optimizes the global poses of local image frames by maximizing the similarity between a predefined panoramic image and local images. Although we formulate the problem as a Direct Bundle Adjustment (DBA) that jointly optimizes the poses of local frames and the intensities of the panoramic image, by investigating the independence of pose estimation from the panoramic image in the solving process, DSR is proposed to solve the poses only and proved to be able to obtain the same optimal poses as DBA. The proposed method is particularly suitable for the scenarios where distinct features are not available, such as Transesophageal Echocardiography (TEE) images. DSR is evaluated by comparing it with four widely used methods via simulated and in-vivo 3D TEE images. It is shown that the proposed method outperforms these four methods in terms of accuracy and requires much fewer computational resources than the state-of-the-art accumulated pairwise estimates (APE).

ROMar 21, 2021

Toward Consistent Drift-free Visual Inertial Localization on Keyframe Based Map

Zhuqing Zhang, Yanmei Jiao, Shoudong Huang et al.

Global localization is essential for robots to perform further tasks like navigation. In this paper, we propose a new framework to perform global localization based on a filter-based visual-inertial odometry framework MSCKF. To reduce the computation and memory consumption, we only maintain the keyframe poses of the map and employ Schmidt-EKF to update the state. This global localization framework is shown to be able to maintain the consistency of the state estimator. Furthermore, we introduce a re-linearization mechanism during the updating phase. This mechanism could ease the linearization error of observation function to make the state estimation more precise. The experiments show that this mechanism is crucial for large and challenging scenes. Simulations and experiments demonstrate the effectiveness and consistency of our global localization framework.

CVMar 22, 2020

Dynamic Reconstruction of Deformable Soft-tissue with Stereo Scope in Minimal Invasive Surgery

Jingwei Song, Jun Wang, Liang Zhao et al.

In minimal invasive surgery, it is important to rebuild and visualize the latest deformed shape of soft-tissue surfaces to mitigate tissue damages. This paper proposes an innovative Simultaneous Localization and Mapping (SLAM) algorithm for deformable dense reconstruction of surfaces using a sequence of images from a stereoscope. We introduce a warping field based on the Embedded Deformation (ED) nodes with 3D shapes recovered from consecutive pairs of stereo images. The warping field is estimated by deforming the last updated model to the current live model. Our SLAM system can: (1) Incrementally build a live model by progressively fusing new observations with vivid accurate texture. (2) Estimate the deformed shape of unobserved region with the principle As-Rigid-As-Possible. (3) Show the consecutive shape of models. (4) Estimate the current relative pose between the soft-tissue and the scope. In-vivo experiments with publicly available datasets demonstrate that the 3D models can be incrementally built for different soft-tissues with different deformations from sequences of stereo images obtained by laparoscopes. Results show the potential clinical application of our SLAM system for providing surgeon useful shape and texture information in minimal invasive surgery.

ROFeb 27, 2020

Globally optimal consensus maximization for robust visual inertial localization in point and line map

Yanmei Jiao, Yue Wang, Bo Fu et al.

Map based visual inertial localization is a crucial step to reduce the drift in state estimation of mobile robots. The underlying problem for localization is to estimate the pose from a set of 3D-2D feature correspondences, of which the main challenge is the presence of outliers, especially in changing environment. In this paper, we propose a robust solution based on efficient global optimization of the consensus maximization problem, which is insensitive to high percentage of outliers. We first introduce translation invariant measurements (TIMs) for both points and lines to decouple the consensus maximization problem into rotation and translation subproblems, allowing for a two-stage solver with reduced solution dimensions. Then we show that (i) the rotation can be calculated by minimizing TIMs using only 1-dimensional branch-and-bound (BnB), (ii) the translation can be found by running 1-dimensional search for three times with prioritized progressive voting. Compared with the popular randomized solver, our solver achieves deterministic global convergence without depending on an initial value. While compared with existing BnB based methods, ours is exponentially faster. Finally, by evaluating the performance on both simulation and real-world datasets, our approach gives accurate pose even when there are 90\% outliers (only 2 inliers).

OCNov 13, 2019

Analysis of minima for geodesic and chordal cost for a minimal 2D pose-graph SLAM problem

Felix H. Kong, Jiaheng Zhao, Liang Zhao et al.

In this paper, we show that for a minimal pose-graph problem, even in the ideal case of perfect measurements and spherical covariance, using the so-called "wrap function" when comparing angles results in multiple suboptimal local minima. We numerically estimate regions of attraction to these local minima for some numerical examples, and give evidence to show that they are of nonzero measure. In contrast, under the same assumptions, we show that the \textit{chordal distance} representation of angle error has a unique minimum up to periodicity. For chordal cost, we also search for initial conditions that fail to converge to the global minimum, and find that this occurs with far fewer points than with geodesic cost.

ROJun 20, 2019

An observable time series based SLAM algorithm for deforming environment

Jingwei Song, Liang Zhao, Shoudong Huang et al.

In this paper, we study the back-end of simultaneous localization and mapping (SLAM) problem in deforming environment, where robot localizes itself and tracks multiple non-rigid soft surface using its onboard sensor measurements. An elaborate analysis is conducted on conventional deformation modelling method, Embedded Deformation (ED) graph. We demonstrate and prove that the ED graph widely used in such scenarios is unobservable and leads to multiple solutions unless suitable priors are provided. Example as well as theoretical prove are provided to show the ambiguity of ED graph and camera pose. In modelling non-rigid scenario with ED graph, motion priors of the deforming environment is essential to separate robot pose and deforming environment. The conclusion can be extrapolated to any free form deformation formulation. In solving the observability, this research proposes a preliminary deformable SLAM approach to estimate robot pose in complex environments that exhibits regular motion. A strategy that approximates deformed shape using a linear combination of several previous shapes is proposed to avoid the ambiguity in robot movement and rigid and non-rigid motions of the environment. Fisher information matrix rank analysis with a base case is discussed to prove the effectiveness. Moreover, the proposed algorithm is validated extensively on Monte Carlo simulations and real experiments. It is demonstrated that the new algorithm significantly outperforms conventional rigid SLAM and ED based SLAM especially in scenarios where there is large deformation.

ROJun 20, 2019

Efficient two step optimization for large embedded deformation graph based SLAM

Jingwei Song, Fang Bai, Liang Zhao et al.

Embedded deformation nodes based formulation has been widely applied in deformable geometry and graphical problems. Though being promising in stereo (or RGBD) sensor based SLAM applications, it remains challenging to keep constant speed in deformation nodes parameter estimation when model grows larger. In practice, the processing time grows rapidly in accordance with the expansion of maps. In this paper, we propose an approach to decouple nodes of deformation graph in large scale dense deformable SLAM and keep the estimation time to be constant. We observe that only partial deformable nodes in the graph are connected to visible points. Based on this fact, sparsity of original Hessian matrix is utilized to split parameter estimation in two independent steps. With this new technique, we achieve faster parameter estimation with amortized computation complexity reduced from O(n^2) to closing O(1). As a result, the computation cost barely increases as the map keeps growing. Based on our strategy, computational bottleneck in large scale embedded deformation graph based applications will be greatly mitigated. The effectiveness is validated by experiments, featuring large scale deformation scenarios.

ROMay 23, 2019

IN2LAAMA: INertial Lidar Localisation Autocalibration And MApping

Cedric Le Gentil, Teresa Vidal-Calleja, Shoudong Huang

In this paper, we present INertial Lidar Localisation Autocalibration And MApping (IN2LAAMA): an offline probabilistic framework for localisation, mapping, and extrinsic calibration based on a 3D-lidar and a 6-DoF-IMU. Most of today's lidars collect geometric information about the surrounding environment by sweeping lasers across their field of view. Consequently, 3D-points in one lidar scan are acquired at different timestamps. If the sensor trajectory is not accurately known, the scans are affected by the phenomenon known as motion distortion. The proposed method leverages preintegration with a continuous representation of the inertial measurements to characterise the system's motion at any point in time. It enables precise correction of the motion distortion without relying on any explicit motion model. The system's pose, velocity, biases, and time-shift are estimated via a full batch optimisation that includes automatically generated loop-closure constraints. The autocalibration and the registration of lidar data rely on planar and edge features matched across pairs of scans. The performance of the framework is validated through simulated and real-data experiments.

ROJan 28, 2019

Online Estimation of Ocean Current from Sparse GPS Data for Underwater Vehicles

Ki Myung Brian Lee, Chanyeol Yoo, Ben Hollings et al.

Underwater robots are subject to position drift due to the effect of ocean currents and the lack of accurate localisation while submerged. We are interested in exploiting such position drift to estimate the ocean current in the surrounding area, thereby assisting navigation and planning. We present a Gaussian process~(GP)-based expectation-maximisation~(EM) algorithm that estimates the underlying ocean current using sparse GPS data obtained on the surface and dead-reckoned position estimates. We first develop a specialised GP regression scheme that exploits the incompressibility of ocean currents to counteract the underdetermined nature of the problem. We then use the proposed regression scheme in an EM algorithm that estimates the best-fitting ocean current in between each GPS fix. The proposed algorithm is validated in simulation and on a real dataset, and is shown to be capable of reconstructing the underlying ocean current field. We expect to use this algorithm to close the loop between planning and estimation for underwater navigation in unknown ocean currents.

ROSep 18, 2018

Linear SLAM: Linearising the SLAM Problems using Submap Joining

Liang Zhao, Shoudong Huang, Gamini Dissanayake

The main contribution of this paper is a new submap joining based approach for solving large-scale Simultaneous Localization and Mapping (SLAM) problems. Each local submap is independently built using the local information through solving a small-scale SLAM; the joining of submaps mainly involves solving linear least squares and performing nonlinear coordinate transformations. Through approximating the local submap information as the state estimate and its corresponding information matrix, judiciously selecting the submap coordinate frames, and approximating the joining of a large number of submaps by joining only two maps at a time, either sequentially or in a more efficient Divide and Conquer manner, the nonlinear optimization process involved in most of the existing submap joining approaches is avoided. Thus the proposed submap joining algorithm does not require initial guess or iterations since linear least squares problems have closed-form solutions. The proposed Linear SLAM technique is applicable to feature-based SLAM, pose graph SLAM and D-SLAM, in both two and three dimensions, and does not require any assumption on the character of the covariance matrices. Simulations and experiments are performed to evaluate the proposed Linear SLAM algorithm. Results using publicly available datasets in 2D and 3D show that Linear SLAM produces results that are very close to the best solutions that can be obtained using full nonlinear optimization algorithm started from an accurate initial guess. The C/C++ and MATLAB source codes of Linear SLAM are available on OpenSLAM.

ROJul 10, 2018

Parallax Bundle Adjustment on Manifold with Convexified Initialization

Liyang Liu, Teng Zhang, Yi Liu et al.

Bundle adjustment (BA) with parallax angle based feature parameterization has been shown to have superior performance over BA using inverse depth or XYZ feature forms. In this paper, we propose an improved version of the parallax BA algorithm (PMBA) by extending it to the manifold domain along with observation-ray based objective function. With this modification, the problem formulation faithfully mimics the projective nature in a camera's image formation, BA is able to achieve better convergence, accuracy and robustness. This is particularly useful in handling diverse outdoor environments and collinear motion modes. Capitalizing on these properties, we further propose a pose-graph simplification to PMBA, with significant dimensionality reduction. This pose-graph model is convex in nature, easy to solve and its solution can serve as a good initial guess to the original BA problem which is intrinsically non-convex. We provide theoretical proof that our global initialization strategy can guarantee a near-optimal solution. Using a series of experiments involving diverse environmental conditions and motions, we demonstrate PMBA's superior convergence performance in comparison to other BA methods. We also show that, without incremental initialization or via third-party information, our global initialization process helps to bootstrap the full BA successfully in various scenarios, sequential or out-of-order, including some datasets from the "Bundle Adjustment in the Large" database.

CVMar 6, 2018

MIS-SLAM: Real-time Large Scale Dense Deformable SLAM System in Minimal Invasive Surgery Based on Heterogeneous Computing

Jingwei Song, Jun Wang, Liang Zhao et al.

Real-time simultaneously localization and dense mapping is very helpful for providing Virtual Reality and Augmented Reality for surgeons or even surgical robots. In this paper, we propose MIS-SLAM: a complete real-time large scale dense deformable SLAM system with stereoscope in Minimal Invasive Surgery based on heterogeneous computing by making full use of CPU and GPU. Idled CPU is used to perform ORB- SLAM for providing robust global pose. Strategies are taken to integrate modules from CPU and GPU. We solved the key problem raised in previous work, that is, fast movement of scope and blurry images make the scope tracking fail. Benefiting from improved localization, MIS-SLAM can achieve large scale scope localizing and dense mapping in real-time. It transforms and deforms current model and incrementally fuses new observation while keeping vivid texture. In-vivo experiments conducted on publicly available datasets presented in the form of videos demonstrate the feasibility and practicality of MIS-SLAM for potential clinical purpose.

ROFeb 25, 2017

An Invariant-EKF VINS Algorithm for Improving Consistency

Teng Zhang, Kanzhi Wu, Daobilige Su et al.

The main contribution of this paper is an invariant extended Kalman filter (EKF) for visual inertial navigation systems (VINS). It is demonstrated that the conventional EKF based VINS is not invariant under the stochastic unobservable transformation, associated with translations and a rotation about the gravitational direction. This can lead to inconsistent state estimates as the estimator does not obey a fundamental property of the physical system. To address this issue, we use a novel uncertainty representation to derive a Right Invariant error extended Kalman filter (RIEKF-VINS) that preserves this invariance property. RIEKF-VINS is then adapted to the multistate constraint Kalman filter framework to obtain a consistent state estimator. Both Monte Carlo simulations and real-world experiments are used to validate the proposed method.

ROFeb 22, 2017

Convergence and Consistency Analysis for A 3D Invariant-EKF SLAM

Teng Zhang, Kanzhi Wu, Jingwei Song et al.

In this paper, we investigate the convergence and consistency properties of an Invariant-Extended Kalman Filter (RI-EKF) based Simultaneous Localization and Mapping (SLAM) algorithm. Basic convergence properties of this algorithm are proven. These proofs do not require the restrictive assumption that the Jacobians of the motion and observation models need to be evaluated at the ground truth. It is also shown that the output of RI-EKF is invariant under any stochastic rigid body transformation in contrast to $\mathbb{SO}(3)$ based EKF SLAM algorithm ($\mathbb{SO}(3)$-EKF) that is only invariant under deterministic rigid body transformation. Implications of these invariance properties on the consistency of the estimator are also discussed. Monte Carlo simulation results demonstrate that RI-EKF outperforms $\mathbb{SO}(3)$-EKF, Robocentric-EKF and the "First Estimates Jacobian" EKF, for 3D point feature based SLAM.

RONov 3, 2016

Designing Sparse Reliable Pose-Graph SLAM: A Graph-Theoretic Approach

Kasra Khosoussi, Gaurav S. Sukhatme, Shoudong Huang et al.

In this paper, we aim to design sparse D-optimal (determinantoptimal) pose-graph SLAM problems through the synthesis of sparse graphs with the maximum weighted number of spanning trees. Characterizing graphs with the maximum number of spanning trees is an open problem in general. To tackle this problem, several new theoretical results are established in this paper, including the monotone log-submodularity of the weighted number of spanning trees. By exploiting these structures, we design a complementary pair of near-optimal efficient approximation algorithms with provable guarantees. Our theoretical results are validated using random graphs and a publicly available pose-graph SLAM dataset.

ROJul 6, 2016

Fast, On-board, Model-aided Visual-Inertial Odometry System for Quadrotor Micro Aerial Vehicles

Dinuka Abeywardena, Shoudong Huang, Ben Barnes et al.

The main contribution of this paper is a high frequency, low-complexity, on-board visual-inertial odometry system for quadrotor micro air vehicles. The system consists of an extended Kalman filter (EKF) based state estimation algorithm that fuses information from a low cost MEMS inertial measurement unit acquired at 200Hz and VGA resolution images from a monocular camera at 50Hz. The dynamic model describing the quadrotor motion is employed in the estimation algorithm as a third source of information. Visual information is incorporated into the EKF by enforcing the epipolar constraint on features tracked between image pairs, avoiding the need to explicitly estimate the location of the tracked environmental features. Combined use of the dynamic model and epipolar constraints makes it possible to obtain drift free velocity and attitude estimates in the presence of both accelerometer and gyroscope biases. A strategy to deal with the unobservability that arises when the quadrotor is in hover is also provided. Experimental data from a real-time implementation of the system on a 50 gram embedded computer are presented in addition to the simulations to demonstrate the efficacy of the proposed system.