CVMar 17, 2022
STPLS3D: A Large-Scale Synthetic and Real Aerial Photogrammetry 3D Point Cloud DatasetMeida Chen, Qingyong Hu, Zifan Yu et al.
Although various 3D datasets with different functions and scales have been proposed recently, it remains challenging for individuals to complete the whole pipeline of large-scale data collection, sanitization, and annotation. Moreover, the created datasets usually suffer from extremely imbalanced class distribution or partial low-quality data samples. Motivated by this, we explore the procedurally synthetic 3D data generation paradigm to equip individuals with the full capability of creating large-scale annotated photogrammetry point clouds. Specifically, we introduce a synthetic aerial photogrammetry point clouds generation pipeline that takes full advantage of open geospatial data sources and off-the-shelf commercial packages. Unlike generating synthetic data in virtual games, where the simulated data usually have limited gaming environments created by artists, the proposed pipeline simulates the reconstruction process of the real environment by following the same UAV flight pattern on different synthetic terrain shapes and building densities, which ensure similar quality, noise pattern, and diversity with real data. In addition, the precise semantic and instance annotations can be generated fully automatically, avoiding the expensive and time-consuming manual annotation. Based on the proposed pipeline, we present a richly-annotated synthetic 3D aerial photogrammetry point cloud dataset, termed STPLS3D, with more than 16 $km^2$ of landscapes and up to 18 fine-grained semantic categories. For verification purposes, we also provide a parallel dataset collected from four areas in the real environment. Extensive experiments conducted on our datasets demonstrate the effectiveness and quality of the proposed synthetic dataset.
ROSep 23, 2022
SAFER: Safe Collision Avoidance using Focused and Efficient Trajectory Search with Reinforcement LearningMario Srouji, Hugues Thomas, Hubert Tsai et al.
Collision avoidance is key for mobile robots and agents to operate safely in the real world. In this work we present SAFER, an efficient and effective collision avoidance system that is able to improve safety by correcting the control commands sent by an operator. It combines real-world reinforcement learning (RL), search-based online trajectory planning, and automatic emergency intervention, e.g. automatic emergency braking (AEB). The goal of the RL is to learn an effective corrective control action that is used in a focused search for collision-free trajectories, and to reduce the frequency of triggering automatic emergency braking. This novel setup enables the RL policy to learn safely and directly on mobile robots in a real-world indoor environment, minimizing actual crashes even during training. Our real-world experiments show that, when compared with several baselines, our approach enjoys a higher average speed, lower crash rate, less emergency intervention, smaller computation overhead, and smoother overall control.
CVMay 21, 2024
KPConvX: Modernizing Kernel Point Convolution with Kernel AttentionHugues Thomas, Yao-Hung Hubert Tsai, Timothy D. Barfoot et al.
In the field of deep point cloud understanding, KPConv is a unique architecture that uses kernel points to locate convolutional weights in space, instead of relying on Multi-Layer Perceptron (MLP) encodings. While it initially achieved success, it has since been surpassed by recent MLP networks that employ updated designs and training strategies. Building upon the kernel point principle, we present two novel designs: KPConvD (depthwise KPConv), a lighter design that enables the use of deeper architectures, and KPConvX, an innovative design that scales the depthwise convolutional weights of KPConvD with kernel attention values. Using KPConvX with a modern architecture and training strategy, we are able to outperform current state-of-the-art approaches on the ScanObjectNN, Scannetv2, and S3DIS datasets. We validate our design choices through ablation studies and release our code and models.
ROOct 14, 2024
DR-MPC: Deep Residual Model Predictive Control for Real-world Social NavigationJames R. Han, Hugues Thomas, Jian Zhang et al.
How can a robot safely navigate around people with complex motion patterns? Deep Reinforcement Learning (DRL) in simulation holds some promise, but much prior work relies on simulators that fail to capture the nuances of real human motion. Thus, we propose Deep Residual Model Predictive Control (DR-MPC) to enable robots to quickly and safely perform DRL from real-world crowd navigation data. By blending MPC with model-free DRL, DR-MPC overcomes the DRL challenges of large data requirements and unsafe initial behavior. DR-MPC is initialized with MPC-based path tracking, and gradually learns to interact more effectively with humans. To further accelerate learning, a safety component estimates out-of-distribution states to guide the robot away from likely collisions. In simulation, we show that DR-MPC substantially outperforms prior work, including traditional DRL and residual DRL models. Hardware experiments show our approach successfully enables a robot to navigate a variety of crowded situations with few errors using less than 4 hours of training data.
ROAug 24, 2021
Learning Spatiotemporal Occupancy Grid Maps for Lifelong Navigation in Dynamic ScenesHugues Thomas, Matthieu Gallet de Saint Aurin, Jian Zhang et al.
We present a novel method for generating, predicting, and using Spatiotemporal Occupancy Grid Maps (SOGM), which embed future information of dynamic scenes. Our automated generation process creates groundtruth SOGMs from previous navigation data. We build on prior work to annotate lidar points based on their dynamic properties, which are then projected on time-stamped 2D grids: SOGMs. We design a 3D-2D feedforward architecture, trained to predict the future time steps of SOGMs, given 3D lidar frames as input. Our pipeline is entirely self-supervised, thus enabling lifelong learning for robots. The network is composed of a 3D back-end that extracts rich features and enables the semantic segmentation of the lidar frames, and a 2D front-end that predicts the future information embedded in the SOGMs within planning. We also design a navigation pipeline that uses these predicted SOGMs. We provide both quantitative and qualitative insights into the predictions and validate our choices of network design with a comparison to the state of the art and ablation studies.
ROFeb 22, 2021
Unsupervised Learning of Lidar Features for Use in a Probabilistic Trajectory EstimatorDavid J. Yoon, Haowei Zhang, Mona Gridseth et al.
We present unsupervised parameter learning in a Gaussian variational inference setting that combines classic trajectory estimation for mobile robots with deep learning for rich sensor data, all under a single learning objective. The framework is an extension of an existing system identification method that optimizes for the observed data likelihood, which we improve with modern advances in batch trajectory estimation and deep learning. Though the framework is general to any form of parameter learning and sensor modality, we demonstrate application to feature and uncertainty learning with a deep network for 3D lidar odometry. Our framework learns from only the on-board lidar data, and does not require any form of groundtruth supervision. We demonstrate that our lidar odometry performs better than existing methods that learn the full estimator with a deep network, and comparable to state-of-the-art ICP-based methods on the KITTI odometry dataset. We additionally show results on lidar data from the Oxford RobotCar dataset.
RODec 10, 2020
Self-Supervised Learning of Lidar Segmentation for Autonomous Indoor NavigationHugues Thomas, Ben Agro, Mona Gridseth et al.
We present a self-supervised learning approach for the semantic segmentation of lidar frames. Our method is used to train a deep point cloud segmentation architecture without any human annotation. The annotation process is automated with the combination of simultaneous localization and mapping (SLAM) and ray-tracing algorithms. By performing multiple navigation sessions in the same environment, we are able to identify permanent structures, such as walls, and disentangle short-term and long-term movable objects, such as people and tables, respectively. New sessions can then be performed using a network trained to predict these semantic labels. We demonstrate the ability of our approach to improve itself over time, from one session to the next. With semantically filtered point clouds, our robot can navigate through more complex scenarios, which, when added to the training pool, help to improve our network predictions. We provide insights into our network predictions and show that our approach can also improve the performances of common localization techniques.
CVDec 7, 2020
Rotation-Invariant Point Convolution With Multiple Equivariant AlignmentsHugues Thomas
Recent attempts at introducing rotation invariance or equivariance in 3D deep learning approaches have shown promising results, but these methods still struggle to reach the performances of standard 3D neural networks. In this work we study the relation between equivariance and invariance in 3D point convolutions. We show that using rotation-equivariant alignments, it is possible to make any convolutional layer rotation-invariant. Furthermore, we improve this simple alignment procedure by using the alignment themselves as features in the convolution, and by combining multiple alignments together. With this core layer, we design rotation-invariant architectures which improve state-of-the-art results in both object classification and semantic segmentation and reduces the gap between rotation-invariant and standard 3D deep learning approaches.
CVApr 18, 2019
KPConv: Flexible and Deformable Convolution for Point CloudsHugues Thomas, Charles R. Qi, Jean-Emmanuel Deschaud et al.
We present Kernel Point Convolution (KPConv), a new design of point convolution, i.e. that operates on point clouds without any intermediate representation. The convolution weights of KPConv are located in Euclidean space by kernel points, and applied to the input points close to them. Its capacity to use any number of kernel points gives KPConv more flexibility than fixed grid convolutions. Furthermore, these locations are continuous in space and can be learned by the network. Therefore, KPConv can be extended to deformable convolutions that learn to adapt kernel points to local geometry. Thanks to a regular subsampling strategy, KPConv is also efficient and robust to varying densities. Whether they use deformable KPConv for complex tasks, or rigid KPconv for simpler tasks, our networks outperform state-of-the-art classification and segmentation approaches on several datasets. We also offer ablation studies and visualizations to provide understanding of what has been learned by KPConv and to validate the descriptive power of deformable KPConv.
CVAug 1, 2018
Semantic Classification of 3D Point Clouds with Multiscale Spherical NeighborhoodsHugues Thomas, Jean-Emmanuel Deschaud, Beatriz Marcotegui et al.
This paper introduces a new definition of multiscale neighborhoods in 3D point clouds. This definition, based on spherical neighborhoods and proportional subsampling, allows the computation of features with a consistent geometrical meaning, which is not the case when using k-nearest neighbors. With an appropriate learning strategy, the proposed features can be used in a random forest to classify 3D points. In this semantic classification task, we show that our multiscale features outperform state-of-the-art features using the same experimental conditions. Furthermore, their classification power competes with more elaborate classification approaches including Deep Learning methods.