CVAug 2, 2023Code
MDT3D: Multi-Dataset Training for LiDAR 3D Object Detection GeneralizationLouis Soum-Fontez, Jean-Emmanuel Deschaud, François Goulette
Supervised 3D Object Detection models have been displaying increasingly better performance in single-domain cases where the training data comes from the same environment and sensor as the testing data. However, in real-world scenarios data from the target domain may not be available for finetuning or for domain adaptation methods. Indeed, 3D object detection models trained on a source dataset with a specific point distribution have shown difficulties in generalizing to unseen datasets. Therefore, we decided to leverage the information available from several annotated source datasets with our Multi-Dataset Training for 3D Object Detection (MDT3D) method to increase the robustness of 3D object detection models when tested in a new environment with a different sensor configuration. To tackle the labelling gap between datasets, we used a new label mapping based on coarse labels. Furthermore, we show how we managed the mix of datasets during training and finally introduce a new cross-dataset augmentation method: cross-dataset object injection. We demonstrate that this training paradigm shows improvements for different types of 3D object detection models. The source code and additional results for this research project will be publicly available on GitHub for interested parties to access and utilize: https://github.com/LouisSF/MDT3D
CVNov 6, 2023
COLA: COarse-LAbel multi-source LiDAR semantic segmentation for autonomous drivingJules Sanchez, Jean-Emmanuel Deschaud, François Goulette
LiDAR semantic segmentation for autonomous driving has been a growing field of interest in recent years. Datasets and methods have appeared and expanded very quickly, but methods have not been updated to exploit this new data availability and rely on the same classical datasets. Different ways of performing LIDAR semantic segmentation training and inference can be divided into several subfields, which include the following: domain generalization, source-to-source segmentation, and pre-training. In this work, we aim to improve results in all of these subfields with the novel approach of multi-source training. Multi-source training relies on the availability of various datasets at training time. To overcome the common obstacles in multi-source training, we introduce the coarse labels and call the newly created multi-source dataset COLA. We propose three applications of this new dataset that display systematic improvement over single-source strategies: COLA-DG for domain generalization (+10%), COLA-S2S for source-to-source segmentation (+5.3%), and COLA-PT for pre-training (+12%). We demonstrate that multi-source approaches bring systematic improvement over single-source approaches.
CVMar 17, 2022
Unsigned Distance Field as an Accurate 3D Scene Representation for Neural Scene CompletionJean Pierre Richa, Jean-Emmanuel Deschaud, François Goulette et al.
Scene Completion is the task of completing missing geometry from a partial scan of a scene. Most previous methods compute an implicit representation from range data using a Truncated Signed Distance Function (T-SDF) computed on a 3D grid as input to neural networks. The truncation decreases but does not remove the border errors introduced by the sign of SDF for open surfaces. As an alternative, we present an Unsigned Distance Function (UDF) as an input representation to scene completion neural networks. The proposed UDF is simple, and efficient as a geometry representation, and can be computed on any point cloud. In contrast to usual Signed Distance Functions, our UDF does not require normal computation. To obtain the explicit geometry, we present a method for extracting a point cloud from discretized UDF values on a sparse grid. We compare different SDFs and UDFs for the scene completion task on indoor and outdoor point clouds collected using RGB-D and LiDAR sensors and show improved completion using the proposed UDF function.
CVMay 18
Token-Space Mask Prediction for Efficient Vision Transformer SegmentationCalvin Galagain, Martyna Poreba, François Goulette
Query-based Vision Transformer segmentation models typically reconstruct dense spatial feature maps to predict masks, inheriting design patterns from convolutional architectures. We show that this explicit image-space reconstruction is not required. We introduce TokenMask, a token-space mask head that computes mask logits directly from query-token affinities and performs interpolation in logit space rather than feature space. This reformulation preserves the original linear scoring mechanism while simplifying the computational structure. Across diverse ViT backbones, datasets and segmentation tasks, TokenMask consistently improves efficiency over prior approaches by reducing computational and memory requirements while maintaining competitive accuracy, leading to tangible speedups on NVIDIA Jetson AGX Orin using TensorRT FP16 inference. Overall, TokenMask yields a simpler and more deployment-friendly design for embedded vision systems.
CVMar 26, 2021Code
3D Point Cloud Registration with Multi-Scale Architecture and Unsupervised Transfer LearningSofiane Horache, Jean-Emmanuel Deschaud, François Goulette
We propose a method for generalizing deep learning for 3D point cloud registration on new, totally different datasets. It is based on two components, MS-SVConv and UDGE. Using Multi-Scale Sparse Voxel Convolution, MS-SVConv is a fast deep neural network that outputs the descriptors from point clouds for 3D registration between two scenes. UDGE is an algorithm for transferring deep networks on unknown datasets in a unsupervised way. The interest of the proposed method appears while using the two components, MS-SVConv and UDGE, together as a whole, which leads to state-of-the-art results on real world registration datasets such as 3DMatch, ETH and TUM. The code is publicly available at https://github.com/humanpose1/MS-SVConv .
ROMar 17, 2021Code
What's in My LiDAR Odometry Toolbox?Pierre Dellenbach, Jean-Emmanuel Deschaud, Bastien Jacquet et al.
With the democratization of 3D LiDAR sensors, precise LiDAR odometries and SLAM are in high demand. New methods regularly appear, proposing solutions ranging from small variations in classical algorithms to radically new paradigms based on deep learning. Yet it is often difficult to compare these methods, notably due to the few datasets on which the methods can be evaluated and compared. Furthermore, their weaknesses are rarely examined, often letting the user discover the hard way whether a method would be appropriate for a use case. In this paper, we review and organize the main 3D LiDAR odometries into distinct categories. We implemented several approaches (geometric based, deep learning based, and hybrid methods) to conduct an in-depth analysis of their strengths and weaknesses on multiple datasets, guiding the reader through the different LiDAR odometries available. Implementation of the methods has been made publicly available at https://github.com/Kitware/pyLiDAR-SLAM.
ROApr 21
Multimodal embodiment-aware navigation transformerLouis Dezons, Quentin Picard, Rémi Marsal et al.
Goal-conditioned navigation models for ground robots trained using supervised learning show promising zero-shot transfer, but their collision-avoidance capability nevertheless degrades under distribution shift, i.e. environmental, robot or sensor configuration changes. We propose ViLiNT a multimodal, attention-based policy for goal navigation, trained on heterogeneous data from multiple platforms and environments, which improves robustness with two key features. First, we fuse RGB images, 3D LiDAR point clouds, a goal embedding and a robot's embodiment descriptor with a transformer architecture to capture complementary geometry and appearance cues. The transformer's output is used to condition a diffusion model that generates navigable trajectories. Second, using automatically generated offline labels, we train a path clearance prediction head for scoring and ranking trajectories produced by the diffusion model. The diffusion conditioning as well as the trajectory ranking head depend on a robot's embodiment token that allows our model to generate and select trajectories with respect to the robot's dimensions. Across three simulated environments, ViLiNT improves Success Rate on average by 166\% over equivalent state-of-the-art vision-only baseline (NoMaD). This increase in performance is confirmed through real-world deployments of a rover navigating in obstacle fields. These results highlight that combining multimodal fusion with our collision prediction mechanism leads to improved off-road navigation robustness.
CVJan 24, 2025
3DLabelProp: Geometric-Driven Domain Generalization for LiDAR Semantic Segmentation in Autonomous DrivingJules Sanchez, Jean-Emmanuel Deschaud, François Goulette
Domain generalization aims to find ways for deep learning models to maintain their performance despite significant domain shifts between training and inference datasets. This is particularly important for models that need to be robust or are costly to train. LiDAR perception in autonomous driving is impacted by both of these concerns, leading to the emergence of various approaches. This work addresses the challenge by proposing a geometry-based approach, leveraging the sequential structure of LiDAR sensors, which sets it apart from the learning-based methods commonly found in the literature. The proposed method, called 3DLabelProp, is applied on the task of LiDAR Semantic Segmentation (LSS). Through extensive experimentation on seven datasets, it is demonstrated to be a state-of-the-art approach, outperforming both naive and other domain generalization methods.
CVOct 31, 2024
HD-OOD3D: Supervised and Unsupervised Out-of-Distribution object detection in LiDAR dataLouis Soum-Fontez, Jean-Emmanuel Deschaud, François Goulette
Autonomous systems rely on accurate 3D object detection from LiDAR data, yet most detectors are limited to a predefined set of known classes, making them vulnerable to unexpected out-of-distribution (OOD) objects. In this work, we present HD-OOD3D, a novel two-stage method for detecting unknown objects. We demonstrate the superiority of two-stage approaches over single-stage methods, achieving more robust detection of unknown objects while addressing key challenges in the evaluation protocol. Furthermore, we conduct an in-depth analysis of the standard evaluation protocol for OOD detection, revealing the critical impact of hyperparameter choices. To address the challenge of scaling the learning of unknown objects, we explore unsupervised training strategies to generate pseudo-labels for unknowns. Among the different approaches evaluated, our experiments show that top-5 auto-labelling offers more promising performance compared to simple resizing techniques.
CVFeb 14, 2022
COLA: COarse LAbel pre-training for 3D semantic segmentation of sparse LiDAR datasetsJules Sanchez, Jean-Emmanuel Deschaud, François Goulette
Transfer learning is a proven technique in 2D computer vision to leverage the large amount of data available and achieve high performance with datasets limited in size due to the cost of acquisition or annotation. In 3D, annotation is known to be a costly task; nevertheless, pre-training methods have only recently been investigated. Due to this cost, unsupervised pre-training has been heavily favored. In this work, we tackle the case of real-time 3D semantic segmentation of sparse autonomous driving LiDAR scans. Such datasets have been increasingly released, but each has a unique label set. We propose here an intermediate-level label set called coarse labels, which can easily be used on any existing and future autonomous driving datasets, thus allowing all the data available to be leveraged at once without any additional manual labeling. This way, we have access to a larger dataset, alongside a simple task of semantic segmentation. With it, we introduce a new pre-training task: coarse label pre-training, also called COLA. We thoroughly analyze the impact of COLA on various datasets and architectures and show that it yields a noticeable performance improvement, especially when only a small dataset is available for the finetuning task.
CVSep 30, 2021
Riedones3D: a celtic coin dataset for registration and fine-grained clusteringSofiane Horache, Jean-Emmanuel Deschaud, François Goulette et al.
Clustering coins with respect to their die is an important component of numismatic research and crucial for understanding the economic history of tribes (especially when literary production does not exist, in celtic culture). It is a very hard task that requires a lot of times and expertise. To cluster thousands of coins, automatic methods are becoming necessary. Nevertheless, public datasets for coin die clustering evaluation are too rare, though they are very important for the development of new methods. Therefore, we propose a new 3D dataset of 2 070 scans of coins. With this dataset, we propose two benchmarks, one for point cloud registration, essential for coin die recognition, and a benchmark of coin die clustering. We show how we automatically cluster coins to help experts, and perform a preliminary evaluation for these two tasks. The code of the baseline and the dataset will be publicly available at https://www.npm3d.fr/coins-riedones3d and https://www.chronocarto.eu/spip.php?article84&lang=fr
ROSep 27, 2021
CT-ICP: Real-time Elastic LiDAR Odometry with Loop ClosurePierre Dellenbach, Jean-Emmanuel Deschaud, Bastien Jacquet et al.
Multi-beam LiDAR sensors are increasingly used in robotics, particularly with autonomous cars for localization and perception tasks, both relying on the ability to build a precise map of the environment. For this, we propose a new real-time LiDAR-only odometry method called CT-ICP (for Continuous-Time ICP), completed into a full SLAM with a novel loop detection procedure. The core of this method, is the introduction of the combined continuity in the scan matching, and discontinuity between scans. It allows both the elastic distortion of the scan during the registration for increased precision, and the increased robustness to high frequency motions from the discontinuity. We build a complete SLAM on top of this odometry, using a fast pure LiDAR loop detection based on elevation image 2D matching, providing a pose graph with loop constraints. To show the robustness of the method, we tested it on seven datasets: KITTI, KITTI-raw, KITTI-360, KITTI-CARLA, ParisLuco, Newer College, and NCLT in driving and high-frequency motion scenarios. Both the CT-ICP odometry and the loop detection are made available online. CT-ICP is currently first, among those giving access to a public code, on the KITTI odometry leaderboard, with an average Relative Translation Error (RTE) of 0.59% and an average time per scan of 60ms on a CPU with a single thread.
CVMay 12, 2020
Automatic clustering of Celtic coins based on 3D point cloud pattern analysisSofiane Horache, Jean-Emmanuel Deschaud, François Goulette et al.
The recognition and clustering of coins which have been struck by the same die is of interest for archeological studies. Nowadays, this work can only be performed by experts and is very tedious. In this paper, we propose a method to automatically cluster dies, based on 3D scans of coins. It is based on three steps: registration, comparison and graph-based clustering. Experimental results on 90 coins coming from a Celtic treasury from the II-Ith century BC show a clustering quality equivalent to expert's work.
CVApr 18, 2019
KPConv: Flexible and Deformable Convolution for Point CloudsHugues Thomas, Charles R. Qi, Jean-Emmanuel Deschaud et al.
We present Kernel Point Convolution (KPConv), a new design of point convolution, i.e. that operates on point clouds without any intermediate representation. The convolution weights of KPConv are located in Euclidean space by kernel points, and applied to the input points close to them. Its capacity to use any number of kernel points gives KPConv more flexibility than fixed grid convolutions. Furthermore, these locations are continuous in space and can be learned by the network. Therefore, KPConv can be extended to deformable convolutions that learn to adapt kernel points to local geometry. Thanks to a regular subsampling strategy, KPConv is also efficient and robust to varying densities. Whether they use deformable KPConv for complex tasks, or rigid KPconv for simpler tasks, our networks outperform state-of-the-art classification and segmentation approaches on several datasets. We also offer ablation studies and visualizations to provide understanding of what has been learned by KPConv and to validate the descriptive power of deformable KPConv.
CVAug 1, 2018
Semantic Classification of 3D Point Clouds with Multiscale Spherical NeighborhoodsHugues Thomas, Jean-Emmanuel Deschaud, Beatriz Marcotegui et al.
This paper introduces a new definition of multiscale neighborhoods in 3D point clouds. This definition, based on spherical neighborhoods and proportional subsampling, allows the computation of features with a consistent geometrical meaning, which is not the case when using k-nearest neighbors. With an appropriate learning strategy, the proposed features can be used in a random forest to classify 3D points. In this semantic classification task, we show that our multiscale features outperform state-of-the-art features using the same experimental conditions. Furthermore, their classification power competes with more elaborate classification approaches including Deep Learning methods.
CVApr 10, 2018
Classification of Point Cloud Scenes with Multiscale Voxel Deep NetworkXavier Roynard, Jean-Emmanuel Deschaud, François Goulette
In this article we describe a new convolutional neural network (CNN) to classify 3D point clouds of urban or indoor scenes. Solutions are given to the problems encountered working on scene point clouds, and a network is described that allows for point classification using only the position of points in a multi-scale neighborhood. On the reduced-8 Semantic3D benchmark [Hackel et al., 2017], this network, ranked second, beats the state of the art of point classification methods (those not using a regularization step).
LGNov 30, 2017
Paris-Lille-3D: a large and high-quality ground truth urban point cloud dataset for automatic segmentation and classificationXavier Roynard, Jean-Emmanuel Deschaud, François Goulette
This paper introduces a new Urban Point Cloud Dataset for Automatic Segmentation and Classification acquired by Mobile Laser Scanning (MLS). We describe how the dataset is obtained from acquisition to post-processing and labeling. This dataset can be used to learn classification algorithm, however, given that a great attention has been paid to the split between the different objects, this dataset can also be used to learn the segmentation. The dataset consists of around 2km of MLS point cloud acquired in two cities. The number of points and range of classes make us consider that it can be used to train Deep-Learning methods. Besides we show some results of automatic segmentation and classification. The dataset is available at: http://caor-mines-paristech.fr/fr/paris-lille-3d-dataset/
SYMar 4, 2015
Invariant EKF Design for Scan Matching-aided LocalizationMartin Barczyk, Silvère Bonnabel, Jean-Emmanuel Deschaud et al.
Localization in indoor environments is a technique which estimates the robot's pose by fusing data from onboard motion sensors with readings of the environment, in our case obtained by scan matching point clouds captured by a low-cost Kinect depth camera. We develop both an Invariant Extended Kalman Filter (IEKF)-based and a Multiplicative Extended Kalman Filter (MEKF)-based solution to this problem. The two designs are successfully validated in experiments and demonstrate the advantage of the IEKF design.
RODec 3, 2014
Colorisation et texturation temps réel d'environnements urbains par système mobile avec scanner laser et caméra fish-eyeJean-Emmanuel Deschaud, Xavier Brun, François Goulette
We present here a real time mobile mapping system mounted on a vehicle. The terrestrial acquisition system is based on a geolocation system and two sensors, namely, a laser scanner and a camera with a fish-eye lens. We produce 3D colored points cloud and textured models of the environment. Once the system has been calibrated, the data acquisition and processing are done "on the way". This article mainly presents our methods of colorization of point cloud, triangulation and texture mapping.
CVOct 16, 2014
On the Covariance of ICP-based Scan-matching TechniquesSilvère Bonnabel, Martin Barczyk, François Goulette
This paper considers the problem of estimating the covariance of roto-translations computed by the Iterative Closest Point (ICP) algorithm. The problem is relevant for localization of mobile robots and vehicles equipped with depth-sensing cameras (e.g., Kinect) or Lidar (e.g., Velodyne). The closed-form formulas for covariance proposed in previous literature generally build upon the fact that the solution to ICP is obtained by minimizing a linear least-squares problem. In this paper, we show this approach needs caution because the rematching step of the algorithm is not explicitly accounted for, and applying it to the point-to-point version of ICP leads to completely erroneous covariances. We then provide a formal mathematical proof why the approach is valid in the point-to-plane version of ICP, which validates the intuition and experimental results of practitioners.
SYMar 20, 2014
Experimental Implementation of an Invariant Extended Kalman Filter-based Scan Matching SLAMMartin Barczyk, Silvère Bonnabel, Jean-Emmanuel Deschaud et al.
We describe an application of the Invariant Extended Kalman Filter (IEKF) design methodology to the scan matching SLAM problem. We review the theoretical foundations of the IEKF and its practical interest of guaranteeing robustness to poor state estimates, then implement the filter on a wheeled robot hardware platform. The proposed design is successfully validated in experimental testing.
ROMay 16, 2012
Accurate 3D maps from depth images and motion sensors via nonlinear Kalman filteringThibault Hervier, Silvère Bonnabel, François Goulette
This paper investigates the use of depth images as localisation sensors for 3D map building. The localisation information is derived from the 3D data thanks to the ICP (Iterative Closest Point) algorithm. The covariance of the ICP, and thus of the localization error, is analysed, and described by a Fisher Information Matrix. It is advocated this error can be much reduced if the data is fused with measurements from other motion sensors, or even with prior knowledge on the motion. The data fusion is performed by a recently introduced specific extended Kalman filter, the so-called Invariant EKF, and is directly based on the estimated covariance of the ICP. The resulting filter is very natural, and is proved to possess strong properties. Experiments with a Kinect sensor and a three-axis gyroscope prove clear improvement in the accuracy of the localization, and thus in the accuracy of the built 3D map.