CVNov 10, 2020Code
A New Framework for Registration of Semantic Point Clouds from Stereo and RGB-D CamerasRay Zhang, Tzu-Yuan Lin, Chien Erh Lin et al.
This paper reports on a novel nonparametric rigid point cloud registration framework that jointly integrates geometric and semantic measurements such as color or semantic labels into the alignment process and does not require explicit data association. The point clouds are represented as nonparametric functions in a reproducible kernel Hilbert space. The alignment problem is formulated as maximizing the inner product between two functions, essentially a sum of weighted kernels, each of which exploits the local geometric and semantic features. As a result of the continuous models, analytical gradients can be computed, and a local solution can be obtained by optimization over the rigid body transformation group. Besides, we present a new point cloud alignment metric that is intrinsic to the proposed framework and takes into account geometric and semantic information. The evaluations using publicly available stereo and RGB-D datasets show that the proposed method outperforms state-of-the-art outdoor and indoor frame-to-frame registration methods. An open-source GPU implementation is also provided.
CVMar 21, 2020Code
Monocular Depth Prediction through Continuous 3D LossMinghan Zhu, Maani Ghaffari, Yuanxin Zhong et al.
This paper reports a new continuous 3D loss function for learning depth from monocular images. The dense depth prediction from a monocular image is supervised using sparse LIDAR points, which enables us to leverage available open source datasets with camera-LIDAR sensor suites during training. Currently, accurate and affordable range sensor is not readily available. Stereo cameras and LIDARs measure depth either inaccurately or sparsely/costly. In contrast to the current point-to-point loss evaluation approach, the proposed 3D loss treats point clouds as continuous objects; therefore, it compensates for the lack of dense ground truth depth due to LIDAR's sparsity measurements. We applied the proposed loss in three state-of-the-art monocular depth prediction approaches DORN, BTS, and Monodepth2. Experimental evaluation shows that the proposed loss improves the depth prediction accuracy and produces point-clouds with more consistent 3D geometric structures compared with all tested baselines, implying the benefit of the proposed loss on general depth prediction networks. A video demo of this work is available at https://youtu.be/5HL8BjSAY4Y.
ROJun 18, 2019Code
Characterizing the Uncertainty of Jointly Distributed Poses in the Lie AlgebraJoshua G. Mangelson, Maani Ghaffari, Ram Vasudevan et al.
An accurate characterization of pose uncertainty is essential for safe autonomous navigation. Early pose uncertainty characterization methods proposed by Smith, Self, and Cheeseman (SCC), used coordinate-based first-order methods to propagate uncertainty through non-linear functions such as pose composition (head-to-tail), pose inversion, and relative pose extraction (tail-to-tail). Characterizing uncertainty in the Lie Algebra of the special Euclidean group results in better uncertainty estimates. However, existing approaches assume that individual poses are independent. Since factors in a pose graph induce correlation, this independence assumption is usually not reflected in reality. In addition, prior work has focused primarily on the pose composition operation. This paper develops a framework for modeling the uncertainty of jointly distributed poses and describes how to perform the equivalent of the SSC pose operations while characterizing uncertainty in the Lie Algebra. Evaluation on simulated and open-source datasets shows that the proposed methods result in more accurate uncertainty estimates. An accompanying C++ library implementation is also released. This is a pre-print of a paper submitted to IEEE TRO in 2019.
ROJun 28, 2021
Multitask Learning for Scalable and Dense Multilayer Bayesian Map InferenceLu Gan, Youngji Kim, Jessy W. Grizzle et al.
This article presents a novel and flexible multitask multilayer Bayesian mapping framework with readily extendable attribute layers. The proposed framework goes beyond modern metric-semantic maps to provide even richer environmental information for robots in a single mapping formalism while exploiting intralayer and interlayer correlations. It removes the need for a robot to access and process information from many separate maps when performing a complex task, advancing the way robots interact with their environments. To this end, we design a multitask deep neural network with attention mechanisms as our front-end to provide heterogeneous observations for multiple map layers simultaneously. Our back-end runs a scalable closed-form Bayesian inference with only logarithmic time complexity. We apply the framework to build a dense robotic map including metric-semantic occupancy and traversability layers. Traversability ground truth labels are automatically generated from exteroceptive sensory data in a self-supervised manner. We present extensive experimental results on publicly available datasets and data collected by a 3D bipedal robot platform and show reliable mapping performance in different environments. Finally, we also discuss how the current framework can be extended to incorporate more information such as friction, signal strength, temperature, and physical quantity concentration using Gaussian map layers. The software for reproducing the presented results or running on customized data is made publicly available.
ROFeb 2, 2020
DeepLocNet: Deep Observation Classification and Ranging Bias Regression for Radio Positioning SystemsSahib Singh Dhanjal, Maani Ghaffari, Ryan M. Eustice
WiFi technology has been used pervasively in fine-grained indoor localization, gesture recognition, and adaptive communication. Achieving better performance in these tasks generally boils down to differentiating Line-Of-Sight (LOS) from Non-Line-Of-Sight (NLOS) signal propagation reliably which generally requires expensive/specialized hardware due to the complex nature of indoor environments. Hence, the development of low-cost accurate positioning systems that exploit available infrastructure is not entirely solved. In this paper, we develop a framework for indoor localization and tracking of ubiquitous mobile devices such as smartphones using on-board sensors. We present a novel deep LOS/NLOS classifier which uses the Received Signal Strength Indicator (RSSI), and can classify the input signal with an accuracy of 85\%. The proposed algorithm can globally localize and track a smartphone (or robot) with a priori unknown location, and with a semi-accurate prior map (error within 0.8 m) of the WiFi Access Points (AP). Through simultaneously solving for the trajectory and the map of access points, we recover a trajectory of the device and corrected locations for the access points. Experimental evaluations of the framework show that localization accuracy is increased by using the trained deep network; furthermore, the system becomes robust to any error in the map of APs.
RODec 2, 2019
A Keyframe-based Continuous Visual SLAM for RGB-D Cameras via Nonparametric Joint Geometric and Appearance RepresentationXi Lin, Dingyi Sun, Tzu-Yuan Lin et al.
This paper reports on a robust RGB-D SLAM system that performs well in scarcely textured and structured environments. We present a novel keyframe-based continuous visual odometry that builds on the recently developed continuous sensor registration framework. A joint geometric and appearance representation is the result of transforming the RGB-D images into functions that live in a Reproducing Kernel Hilbert Space (RKHS). We solve both registration and keyframe selection problems via the inner product structure available in the RKHS. We also extend the proposed keyframe-based odometry method to a SLAM system using indirect ORB loop-closure constraints. The experimental evaluations using publicly available RGB-D benchmarks show that the developed keyframe selection technique using continuous visual odometry outperforms its robust dense (and direct) visual odometry equivalent. In addition, the developed SLAM system has better generalization across different training and validation sequences; it is robust to the lack of texture and structure in the scene; and shows comparable performance with the state-of-the-art SLAM systems.
ROOct 1, 2019
Adaptive Continuous Visual Odometry from RGB-D ImagesTzu-Yuan Lin, William Clark, Ryan M. Eustice et al.
In this paper, we extend the recently developed continuous visual odometry framework for RGB-D cameras to an adaptive framework via online hyperparameter learning. We focus on the case of isotropic kernels with a scalar as the length-scale. In practice and as expected, the length-scale has remarkable impacts on the performance of the original framework. Previously it was handled using a fixed set of conditions within the solver to reduce the length-scale as the algorithm reaches a local minimum. We automate this process by a greedy gradient descent step at each iteration to find the next-best length-scale. Furthermore, to handle failure cases in the gradient descent step where the gradient is not well-behaved, such as the absence of structure or texture in the scene, we use a search interval for the length-scale and guide it gradually toward the smaller values. This latter strategy reverts the adaptive framework to the original setup. The experimental evaluations using publicly available RGB-D benchmarks show the proposed adaptive continuous visual odometry outperforms the original framework and the current state-of-the-art. We also make the software for the developed algorithm publicly available.
ROSep 10, 2019
Bayesian Spatial Kernel Smoothing for Scalable Dense Semantic MappingLu Gan, Ray Zhang, Jessy W. Grizzle et al.
This paper develops a Bayesian continuous 3D semantic occupancy map from noisy point clouds by generalizing the Bayesian kernel inference model for building occupancy maps, a binary problem, to semantic maps, a multi-class problem. The proposed method provides a unified probabilistic model for both occupancy and semantic probabilities and nicely reverts to the original occupancy mapping framework when only one occupied class exists in obtained measurements. The Bayesian spatial kernel inference relaxes the independent grid assumption and brings smoothness and continuity to the map inference, enabling to exploit local correlations present in the environment and increasing the performance. The accompanying software uses multi-threading and vectorization, and runs at about 2 Hz on a laptop CPU. Evaluations using multiple sequences of stereo camera and LiDAR datasets show that the proposed method consistently outperforms current baselines. We also present a qualitative evaluation using data collected with a bipedal robot platform on the University of Michigan - North Campus.
ROApr 19, 2019
Contact-Aided Invariant Extended Kalman Filtering for Robot State EstimationRoss Hartley, Maani Ghaffari, Ryan M. Eustice et al.
Legged robots require knowledge of pose and velocity in order to maintain stability and execute walking paths. Current solutions either rely on vision data, which is susceptible to environmental and lighting conditions, or fusion of kinematic and contact data with measurements from an inertial measurement unit (IMU). In this work, we develop a contact-aided invariant extended Kalman filter (InEKF) using the theory of Lie groups and invariant observer design. This filter combines contact-inertial dynamics with forward kinematic corrections to estimate pose and velocity along with all current contact points. We show that the error dynamics follows a log-linear autonomous differential equation with several important consequences: (a) the observable state variables can be rendered convergent with a domain of attraction that is independent of the system's trajectory; (b) unlike the standard EKF, neither the linearized error dynamics nor the linearized observation model depend on the current state estimate, which (c) leads to improved convergence properties and (d) a local observability matrix that is consistent with the underlying nonlinear system. Furthermore, we demonstrate how to include IMU biases, add/remove contacts, and formulate both world-centric and robo-centric versions. We compare the convergence of the proposed InEKF with the commonly used quaternion-based EKF though both simulations and experiments on a Cassie-series bipedal robot. Filter accuracy is analyzed using motion capture, while a LiDAR mapping experiment provides a practical use case. Overall, the developed contact-aided InEKF provides better performance in comparison with the quaternion-based EKF as a result of exploiting symmetries present in system.
ROApr 3, 2019
Continuous Direct Sparse Visual Odometry from RGB-D ImagesMaani Ghaffari, William Clark, Anthony Bloch et al.
This paper reports on a novel formulation and evaluation of visual odometry from RGB-D images. Assuming a static scene, the developed theoretical framework generalizes the widely used direct energy formulation (photometric error minimization) technique for obtaining a rigid body transformation that aligns two overlapping RGB-D images to a continuous formulation. The continuity is achieved through functional treatment of the problem and representing the process models over RGB-D images in a reproducing kernel Hilbert space; consequently, the registration is not limited to the specific image resolution and the framework is fully analytical with a closed-form derivation of the gradient. We solve the problem by maximizing the inner product between two functions defined over RGB-D images, while the continuous action of the rigid body motion Lie group is captured through the integration of the flow in the corresponding Lie algebra. Energy-based approaches have been extremely successful and the developed framework in this paper shares many of their desired properties such as the parallel structure on both CPUs and GPUs, sparsity, semi-dense tracking, avoiding explicit data association which is computationally expensive, and possible extensions to the simultaneous localization and mapping frameworks. The evaluations on experimental data and comparison with the equivalent energy-based formulation of the problem confirm the effectiveness of the proposed technique, especially, when the lack of structure and texture in the environment is evident.
ROSep 20, 2018
Guaranteed Globally Optimal Planar Pose Graph and Landmark SLAM via Sparse-Bounded Sums-of-Squares ProgrammingJoshua G. Mangelson, Jinsun Liu, Ryan M. Eustice et al.
Autonomous navigation requires an accurate model or map of the environment. While dramatic progress in the prior two decades has enabled large-scale SLAM, the majority of existing methods rely on non-linear optimization techniques to find the MLE of the robot trajectory and surrounding environment. These methods are prone to local minima and are thus sensitive to initialization. Several recent papers have developed optimization algorithms for the Pose-Graph SLAM problem that can certify the optimality of a computed solution. Though this does not guarantee a priori that this approach generates an optimal solution, a recent extension has shown that when the noise lies within a critical threshold that the solution to the optimization algorithm is guaranteed to be optimal. To address the limitations of existing approaches, this paper illustrates that the Pose-Graph SLAM and Landmark SLAM can be formulated as polynomial optimization programs that are SOS convex. This paper then describes how the Pose-Graph and Landmark SLAM problems can be solved to a global minimum without initialization regardless of noise level using the Sparse-BSOS hierarchy. This paper also empirically illustrates that convergence happens at the second step in this hierarchy. In addition, this paper illustrates how this Sparse-BSOS hierarchy can be implemented in the complex domain and empirically shows that convergence happens also at the second step of this complex domain hierarchy. Finally, the superior performance of the proposed approach when compared to existing SLAM methods is illustrated on graphs with several hundred nodes.
ROMay 26, 2018
Contact-Aided Invariant Extended Kalman Filtering for Legged Robot State EstimationRoss Hartley, Maani Ghaffari Jadidi, Jessy W. Grizzle et al.
This paper derives a contact-aided inertial navigation observer for a 3D bipedal robot using the theory of invariant observer design. Aided inertial navigation is fundamentally a nonlinear observer design problem; thus, current solutions are based on approximations of the system dynamics, such as an Extended Kalman Filter (EKF), which uses a system's Jacobian linearization along the current best estimate of its trajectory. On the basis of the theory of invariant observer design by Barrau and Bonnabel, and in particular, the Invariant EKF (InEKF), we show that the error dynamics of the point contact-inertial system follows a log-linear autonomous differential equation; hence, the observable state variables can be rendered convergent with a domain of attraction that is independent of the system's trajectory. Due to the log-linear form of the error dynamics, it is not necessary to perform a nonlinear observability analysis to show that when using an Inertial Measurement Unit (IMU) and contact sensors, the absolute position of the robot and a rotation about the gravity vector (yaw) are unobservable. We further augment the state of the developed InEKF with IMU biases, as the online estimation of these parameters has a crucial impact on system performance. We evaluate the convergence of the proposed system with the commonly used quaternion-based EKF observer using a Monte-Carlo simulation. In addition, our experimental evaluation using a Cassie-series bipedal robot shows that the contact-aided InEKF provides better performance in comparison with the quaternion-based EKF as a result of exploiting symmetries present in the system dynamics.
ROMar 20, 2018
Hybrid Contact Preintegration for Visual-Inertial-Contact State Estimation Using Factor GraphsRoss Hartley, Maani Ghaffari Jadidi, Lu Gan et al.
The factor graph framework is a convenient modeling technique for robotic state estimation where states are represented as nodes, and measurements are modeled as factors. When designing a sensor fusion framework for legged robots, one often has access to visual, inertial, joint encoder, and contact sensors. While visual-inertial odometry has been studied extensively in this framework, the addition of a preintegrated contact factor for legged robots has been only recently proposed. This allowed for integration of encoder and contact measurements into existing factor graphs, however, new nodes had to be added to the graph every time contact was made or broken. In this work, to cope with the problem of switching contact frames, we propose a hybrid contact preintegration theory that allows contact information to be integrated through an arbitrary number of contact switches. The proposed hybrid modeling approach reduces the number of required variables in the nonlinear optimization problem by only requiring new states to be added alongside camera or selected keyframes. This method is evaluated using real experimental data collected from a Cassie-series robot where the trajectory of the robot produced by a motion capture system is used as a proxy for ground truth. The evaluation shows that inclusion of the proposed preintegrated hybrid contact factor alongside visual-inertial navigation systems improves estimation accuracy as well as robustness to vision failure, while its generalization makes it more accessible for legged platforms.
RODec 15, 2017
Legged Robot State-Estimation Through Combined Forward Kinematic and Preintegrated Contact FactorsRoss Hartley, Josh Mangelson, Lu Gan et al.
State-of-the-art robotic perception systems have achieved sufficiently good performance using Inertial Measurement Units (IMUs), cameras, and nonlinear optimization techniques, that they are now being deployed as technologies. However, many of these methods rely significantly on vision and often fail when visual tracking is lost due to lighting or scarcity of features. This paper presents a state-estimation technique for legged robots that takes into account the robot's kinematic model as well as its contact with the environment. We introduce forward kinematic factors and preintegrated contact factors into a factor graph framework that can be incrementally solved in real-time. The forward kinematic factor relates the robot's base pose to a contact frame through noisy encoder measurements. The preintegrated contact factor provides odometry measurements of this contact frame while accounting for possible foot slippage. Together, the two developed factors constrain the graph optimization problem allowing the robot's trajectory to be estimated. The paper evaluates the method using simulated and real sensory IMU and kinematic data from experiments with a Cassie-series robot designed by Agility Robotics. These preliminary experiments show that using the proposed method in addition to IMU decreases drift and improves localization accuracy, suggesting that its use can enable successful recovery from a loss of visual tracking.
ROSep 22, 2017
Sparse Bayesian Inference for Dense Semantic MappingLu Gan, Maani Ghaffari Jadidi, Steven A. Parkison et al.
Despite impressive advances in simultaneous localization and mapping, dense robotic mapping remains challenging due to its inherent nature of being a high-dimensional inference problem. In this paper, we propose a dense semantic robotic mapping technique that exploits sparse Bayesian models, in particular, the relevance vector machine, for high-dimensional sequential inference. The technique is based on the principle of automatic relevance determination and produces sparse models that use a small subset of the original dense training set as the dominant basis. The resulting map posterior is continuous, and queries can be made efficiently at any resolution. Moreover, the technique has probabilistic outputs per semantic class through Bayesian inference. We evaluate the proposed relevance vector semantic map using publicly available benchmark datasets, NYU Depth V2 and KITTI; and the results show promising improvements over the state-of-the-art techniques.
ROJul 5, 2017
Gaussian Processes Semantic Map RepresentationMaani Ghaffari Jadidi, Lu Gan, Steven A. Parkison et al.
In this paper, we develop a high-dimensional map building technique that incorporates raw pixelated semantic measurements into the map representation. The proposed technique uses Gaussian Processes (GPs) multi-class classification for map inference and is the natural extension of GP occupancy maps from binary to multi-class form. The technique exploits the continuous property of GPs and, as a result, the map can be inferred with any resolution. In addition, the proposed GP Semantic Map (GPSM) learns the structural and semantic correlation from measurements rather than resorting to assumptions, and can flexibly learn the spatial correlation as well as any additional non-spatial correlation between map points. We extend the OctoMap to Semantic OctoMap representation and compare with the GPSM mapping performance using NYU Depth V2 dataset. Evaluations of the proposed technique on multiple partially labeled RGBD scans and labels from noisy image segmentation show that the GP semantic map can handle sparse measurements, missing labels in the point cloud, as well as noise corrupted labels.
CVFeb 23, 2017
WaterGAN: Unsupervised Generative Network to Enable Real-time Color Correction of Monocular Underwater ImagesJie Li, Katherine A. Skinner, Ryan M. Eustice et al.
This paper reports on WaterGAN, a generative adversarial network (GAN) for generating realistic underwater images from in-air image and depth pairings in an unsupervised pipeline used for color correction of monocular underwater images. Cameras onboard autonomous and remotely operated vehicles can capture high resolution images to map the seafloor, however, underwater image formation is subject to the complex process of light propagation through the water column. The raw images retrieved are characteristically different than images taken in air due to effects such as absorption and scattering, which cause attenuation of light at different rates for different wavelengths. While this physical process is well described theoretically, the model depends on many parameters intrinsic to the water column as well as the objects in the scene. These factors make recovery of these parameters difficult without simplifying assumptions or field calibration, hence, restoration of underwater images is a non-trivial problem. Deep learning has demonstrated great success in modeling complex nonlinear systems but requires a large amount of training data, which is difficult to compile in deep sea environments. Using WaterGAN, we generate a large training dataset of paired imagery, both raw underwater and true color in-air, as well as depth data. This data serves as input to a novel end-to-end network for color correction of monocular underwater images. Due to the depth-dependent water column effects inherent to underwater environments, we show that our end-to-end network implicitly learns a coarse depth estimate of the underwater scene from monocular underwater images. Our proposed pipeline is validated with testing on real data collected from both a pure water tank and from underwater surveys in field testing. Source code is made publicly available with sample datasets and pretrained models.