CVMar 3, 2023Code
MixVPR: Feature Mixing for Visual Place RecognitionAmar Ali-bey, Brahim Chaib-draa, Philippe Giguère
Visual Place Recognition (VPR) is a crucial part of mobile robotics and autonomous driving as well as other computer vision tasks. It refers to the process of identifying a place depicted in a query image using only computer vision. At large scale, repetitive structures, weather and illumination changes pose a real challenge, as appearances can drastically change over time. Along with tackling these challenges, an efficient VPR technique must also be practical in real-world scenarios where latency matters. To address this, we introduce MixVPR, a new holistic feature aggregation technique that takes feature maps from pre-trained backbones as a set of global features. Then, it incorporates a global relationship between elements in each feature map in a cascade of feature mixing, eliminating the need for local or pyramidal aggregation as done in NetVLAD or TransVPR. We demonstrate the effectiveness of our technique through extensive experiments on multiple large-scale benchmarks. Our method outperforms all existing techniques by a large margin while having less than half the number of parameters compared to CosPlace and NetVLAD. We achieve a new all-time high recall@1 score of 94.6% on Pitts250k-test, 88.0% on MapillarySLS, and more importantly, 58.4% on Nordland. Finally, our method outperforms two-stage retrieval techniques such as Patch-NetVLAD, TransVPR and SuperGLUE all while being orders of magnitude faster. Our code and trained models are available at https://github.com/amaralibey/MixVPR.
CVOct 19, 2022Code
GSV-Cities: Toward Appropriate Supervised Visual Place RecognitionAmar Ali-bey, Brahim Chaib-draa, Philippe Giguère
This paper aims to investigate representation learning for large scale visual place recognition, which consists of determining the location depicted in a query image by referring to a database of reference images. This is a challenging task due to the large-scale environmental changes that can occur over time (i.e., weather, illumination, season, traffic, occlusion). Progress is currently challenged by the lack of large databases with accurate ground truth. To address this challenge, we introduce GSV-Cities, a new image dataset providing the widest geographic coverage to date with highly accurate ground truth, covering more than 40 cities across all continents over a 14-year period. We subsequently explore the full potential of recent advances in deep metric learning to train networks specifically for place recognition, and evaluate how different loss functions influence performance. In addition, we show that performance of existing methods substantially improves when trained on GSV-Cities. Finally, we introduce a new fully convolutional aggregation layer that outperforms existing techniques, including GeM, NetVLAD and CosPlace, and establish a new state-of-the-art on large-scale benchmarks, such as Pittsburgh, Mapillary-SLS, SPED and Nordland. The dataset and code are available for research purposes at https://github.com/amaralibey/gsv-cities.
CVOct 31, 2022Code
Tree Detection and Diameter Estimation Based on Deep LearningVincent Grondin, Jean-Michel Fortin, François Pomerleau et al.
Tree perception is an essential building block toward autonomous forestry operations. Current developments generally consider input data from lidar sensors to solve forest navigation, tree detection and diameter estimation problems. Whereas cameras paired with deep learning algorithms usually address species classification or forest anomaly detection. In either of these cases, data unavailability and forest diversity restrain deep learning developments for autonomous systems. So, we propose two densely annotated image datasets - 43k synthetic, 100 real - for bounding box, segmentation mask and keypoint detections to assess the potential of vision-based methods. Deep neural network models trained on our datasets achieve a precision of 90.4% for tree detection, 87.2% for tree segmentation, and centimeter accurate keypoint estimations. We measure our models' generalizability when testing it on other forest datasets, and their scalability with different dataset sizes and architectural improvements. Overall, the experimental results offer promising avenues toward autonomous tree felling operations and other applied forestry problems. The datasets and pre-trained models in this article are publicly available on \href{https://github.com/norlab-ulaval/PercepTreeV1}{GitHub} (https://github.com/norlab-ulaval/PercepTreeV1).
49.4ROMay 13Code
Saturation-Aware Angular Velocity Estimation: Extending the Robustness of SLAM to Aggressive MotionsSimon-Pierre Deschênes, Dominic Baril, Matěj Boxan et al.
We propose a novel angular velocity estimation method to increase the robustness of Simultaneous Localization And Mapping (SLAM) algorithms against gyroscope saturations induced by aggressive motions. Field robotics expose robots to various hazards, including steep terrains, landslides, and staircases, where substantial accelerations and angular velocities can occur if the robot loses stability and tumbles. These extreme motions can saturate sensor measurements, especially gyroscopes, which are the first sensors to become inoperative. While the structural integrity of the robot is at risk, the robustness of the SLAM framework is oftentimes given little consideration. Consequently, even if the robot is physically capable of continuing the mission, its operation will be compromised due to a corrupted representation of the world. Regarding this problem, we propose a method to estimate the angular velocity using accelerometers during extreme rotations caused by tumbling. We show that our method reduces the median localization error by 71.5 % in translation and 65.5 % in rotation and is robust to mapping failures, which occurred in 37.5 % of the experiments without our method. We also propose the Tumbling-Induced Gyroscope Saturation (TIGS) dataset, which consists of outdoor experiments recording the motion of a mechanical lidar subject to angular velocities four times higher than other similar datasets available. The dataset is available online at https://github.com/norlab-ulaval/Norlab_wiki/wiki/TIGS-Dataset.
CVFeb 28, 2023Code
Global Proxy-based Hard Mining for Visual Place RecognitionAmar Ali-bey, Brahim Chaib-draa, Philippe Giguère
Learning deep representations for visual place recognition is commonly performed using pairwise or triple loss functions that highly depend on the hardness of the examples sampled at each training iteration. Existing techniques address this by using computationally and memory expensive offline hard mining, which consists of identifying, at each iteration, the hardest samples from the training set. In this paper we introduce a new technique that performs global hard mini-batch sampling based on proxies. To do so, we add a new end-to-end trainable branch to the network, which generates efficient place descriptors (one proxy for each place). These proxy representations are thus used to construct a global index that encompasses the similarities between all places in the dataset, allowing for highly informative mini-batch sampling at each training iteration. Our method can be used in combination with all existing pairwise and triplet loss functions with negligible additional memory and computation cost. We run extensive ablation studies and show that our technique brings new state-of-the-art performance on multiple large-scale benchmarks such as Pittsburgh, Mapillary-SLS and SPED. In particular, our method provides more than 100% relative improvement on the challenging Nordland dataset. Our code is available at https://github.com/amaralibey/GPM
CVOct 8, 2022Code
Training Deep Learning Algorithms on Synthetic Forest Images for Tree DetectionVincent Grondin, François Pomerleau, Philippe Giguère
Vision-based segmentation in forested environments is a key functionality for autonomous forestry operations such as tree felling and forwarding. Deep learning algorithms demonstrate promising results to perform visual tasks such as object detection. However, the supervised learning process of these algorithms requires annotations from a large diversity of images. In this work, we propose to use simulated forest environments to automatically generate 43 k realistic synthetic images with pixel-level annotations, and use it to train deep learning algorithms for tree detection. This allows us to address the following questions: i) what kind of performance should we expect from deep learning in harsh synthetic forest environments, ii) which annotations are the most important for training, and iii) what modality should be used between RGB and depth. We also report the promising transfer learning capability of features learned on our synthetic dataset by directly predicting bounding box, segmentation masks and keypoints on real images. Code available on GitHub (https://github.com/norlab-ulaval/PercepTreeV1).
CVMar 3, 2022
Instance Segmentation for Autonomous Log Grasping in Forestry OperationsJean-Michel Fortin, Olivier Gamache, Vincent Grondin et al.
Wood logs picking is a challenging task to automate. Indeed, logs usually come in cluttered configurations, randomly orientated and overlapping. Recent work on log picking automation usually assume that the logs' pose is known, with little consideration given to the actual perception problem. In this paper, we squarely address the latter, using a data-driven approach. First, we introduce a novel dataset, named TimberSeg 1.0, that is densely annotated, i.e., that includes both bounding boxes and pixel-level mask annotations for logs. This dataset comprises 220 images with 2500 individually segmented logs. Using our dataset, we then compare three neural network architectures on the task of individual logs detection and segmentation; two region-based methods and one attention-based method. Unsurprisingly, our results show that axis-aligned proposals, failing to take into account the directional nature of logs, underperform with 19.03 mAP. A rotation-aware proposal method significantly improve results to 31.83 mAP. More interestingly, a Transformer-based approach, without any inductive bias on rotations, outperformed the two others, achieving a mAP of 57.53 on our dataset. Our use case demonstrates the limitations of region-based approaches for cluttered, elongated objects. It also highlights the potential of attention-based methods on this specific task, as they work directly at the pixel-level. These encouraging results indicate that such a perception system could be used to assist the operators on the short-term, or to fully automate log picking operations in the future.
9.0CVMay 7
Leveraging Image Generators to Address Training Data Scarcity: The Gen4Regen Dataset for Forest Regeneration MappingGabriel Jeanson, David-Alexandre Duclos, William Larrivée-Hardy et al.
Sustainable forest management relies on precise species composition mapping, yet traditional ground surveys are labour-intensive and geographically constrained. While Uncrewed Aerial Vehicles (UAVs) offer scalable data collection, the transition to deep learning-based interpretation is bottlenecked by the severe scarcity of expert-annotated imagery, particularly in complex, visually heterogeneous regeneration zones. This paper addresses the dual challenges of data scarcity and extreme class imbalance in the semantic segmentation of fine-grained forest regeneration species by providing a scalable framework that reduces reliance on manual photo-interpretation for high-resolution, millimetre-level aerial imagery. Importantly, we leverage the large-scale vision-language Nano Banana Pro model to simultaneously generate high-fidelity images and their corresponding pixel-aligned semantic masks from prompts. We introduce WilDReF-Q-V2, an expansion of a natural forest dataset with 13 977 new unlabelled and 50 labelled real images, as well as the Gen4Regen dataset, featuring 2101 pairs of synthetic images and semantic masks. Our methodology integrates real-world data with AI-generated images, highlighting that AI-generated data is highly complementary to real-world data, with unified training yielding an F1 score improvement of over 15 %pt compared to purely supervised baselines. Furthermore, we demonstrate that even small quantities of prompt-generated data significantly improve performance for underrepresented species, some of which saw per-species F1 score gains of up to 30 %pt. We conclude that vision-language models can serve as agile data generators, effectively bootstrapping perception tasks for niche AI domains where expert labels are scarce or unavailable. Our datasets, source code, and models will be available at https://norlab-ulaval.github.io/gen4regen.
CVJul 4, 2023
MaskBEV: Joint Object Detection and Footprint Completion for Bird's-eye View 3D Point CloudsWilliam Guimont-Martin, Jean-Michel Fortin, François Pomerleau et al.
Recent works in object detection in LiDAR point clouds mostly focus on predicting bounding boxes around objects. This prediction is commonly achieved using anchor-based or anchor-free detectors that predict bounding boxes, requiring significant explicit prior knowledge about the objects to work properly. To remedy these limitations, we propose MaskBEV, a bird's-eye view (BEV) mask-based object detector neural architecture. MaskBEV predicts a set of BEV instance masks that represent the footprints of detected objects. Moreover, our approach allows object detection and footprint completion in a single pass. MaskBEV also reformulates the detection problem purely in terms of classification, doing away with regression usually done to predict bounding boxes. We evaluate the performance of MaskBEV on both SemanticKITTI and KITTI datasets while analyzing the architecture advantages and limitations.
CVMay 12, 2024Code
BoQ: A Place is Worth a Bag of Learnable QueriesAmar Ali-Bey, Brahim Chaib-draa, Philippe Giguère
In visual place recognition, accurately identifying and matching images of locations under varying environmental conditions and viewpoints remains a significant challenge. In this paper, we introduce a new technique, called Bag-of-Queries (BoQ), which learns a set of global queries designed to capture universal place-specific attributes. Unlike existing methods that employ self-attention and generate the queries directly from the input features, BoQ employs distinct learnable global queries, which probe the input features via cross-attention, ensuring consistent information aggregation. In addition, our technique provides an interpretable attention mechanism and integrates with both CNN and Vision Transformer backbones. The performance of BoQ is demonstrated through extensive experiments on 14 large-scale benchmarks. It consistently outperforms current state-of-the-art techniques including NetVLAD, MixVPR and EigenPlaces. Moreover, as a global retrieval technique (one-stage), BoQ surpasses two-stage retrieval methods, such as Patch-NetVLAD, TransVPR and R2Former, all while being orders of magnitude faster and more efficient. The code and model weights are publicly available at https://github.com/amaralibey/Bag-of-Queries.
20.0LGMay 19
Symmetrization of Loss Functions for Robust Training of Neural Networks in the Presence of Noisy LabelsAlexandre Lemire Paquin, Brahim Chaib-Draa, Philippe Giguère
Labeling a training set is often expensive and susceptible to errors, making the design of robust loss functions for label noise an important problem. The symmetry condition provides theoretical guarantees for robustness to such noise. In this work, we study a symmetrization method arising from the unique decomposition of any multi-class loss function into a symmetric component and a class-insensitive term. In particular, symmetrizing the cross-entropy loss leads to a linear multi-class extension of the unhinged loss. Unlike in the binary case, the multi-class version must have specific coefficients in order to satisfy the symmetry condition. Under suitable assumptions, we show that this multi-class unhinged loss is the unique convex multi-class symmetric loss. We also show that it has a fundamental local role: the linear approximation of any symmetric loss around score vectors with equal components is equivalent to the multi-class unhinged loss. We then introduce SGCE and alpha-MAE, two loss functions that interpolate between the multi-class unhinged loss and the Mean Absolute Error while allowing control of the beta-smoothness of the loss. Experiments on standard noisy-label benchmarks show competitive performance compared with existing robust loss functions.
31.3ROMay 17
Stretch-ICP: A Continuous-Trajectory Registration and Deskewing Algorithm in Scenarios of Aggressive MotionsSimon-Pierre Deschênes, Veronica Vannini, Philippe Giguère et al.
Robust robotic autonomy remains challenging in complex environments, where loss of stability on uneven or slippery terrain can induce extreme accelerations and angular velocities. Such motions corrupt sensor measurements and degrade state estimation, motivating the need for improved algorithmic robustness. To investigate this issue, we introduce the Tumbling-Induced Gyroscope Saturation (TIGS) dataset, which consists of recordings from a mechanical lidar and an Inertial Measurement Unit (IMU) tumbling down a hill. The dataset contains angular speeds up to four times higher than those in similar datasets and is publicly available. We then propose two complementary methods to improve Simultaneous Localization And Mapping (SLAM) robustness and evaluate them on TIGS. First, Saturation-Aware Angular Velocity Estimation (SAAVE) estimates angular velocities when gyroscope measurements become saturated during aggressive motions, reducing angular speed estimation error by 83.4%. Second, Stretch-ICP, a novel registration and deskewing algorithm, enables reconstruction of smoother 6-Degrees Of Freedom (DOF) trajectories under aggressive motions compared to classical Iterative Closest Point (ICP). Stretch-ICP reduces linear and angular velocity errors by 95.2% and 94.8%, respectively, at scan boundaries. Together, these contributions improve the robustness and consistency of lidar-inertial state estimation under aggressive motions.
ROMar 25, 2024Code
Proprioception Is All You Need: Terrain Classification for Boreal ForestsDamien LaRocque, William Guimont-Martin, David-Alexandre Duclos et al.
Recent works in field robotics highlighted the importance of resiliency against different types of terrains. Boreal forests, in particular, are home to many mobility-impeding terrains that should be considered for off-road autonomous navigation. Also, being one of the largest land biomes on Earth, boreal forests are an area where autonomous vehicles are expected to become increasingly common. In this paper, we address this issue by introducing BorealTC, a publicly available dataset for proprioceptive-based terrain classification (TC). Recorded with a Husky A200, our dataset contains 116 min of Inertial Measurement Unit (IMU), motor current, and wheel odometry data, focusing on typical boreal forest terrains, notably snow, ice, and silty loam. Combining our dataset with another dataset from the state-of-the-art, we evaluate both a Convolutional Neural Network (CNN) and the novel state space model (SSM)-based Mamba architecture on a TC task. Interestingly, we show that while CNN outperforms Mamba on each separate dataset, Mamba achieves greater accuracy when trained on a combination of both. In addition, we demonstrate that Mamba's learning capacity is greater than a CNN for increasing amounts of data. We show that the combination of two TC datasets yields a latent space that can be interpreted with the properties of the terrains. We also discuss the implications of merging datasets on classification. Our source code and dataset are publicly available online: https://github.com/norlab-ulaval/BorealTC.
CVOct 10, 2025Code
SilvaScenes: Tree Segmentation and Species Classification from Under-Canopy Images in Natural ForestsDavid-Alexandre Duclos, William Guimont-Martin, Gabriel Jeanson et al.
Interest in robotics for forest management is growing, but perception in complex, natural environments remains a significant hurdle. Conditions such as heavy occlusion, variable lighting, and dense vegetation pose challenges to automated systems, which are essential for precision forestry, biodiversity monitoring, and the automation of forestry equipment. These tasks rely on advanced perceptual capabilities, such as detection and fine-grained species classification of individual trees. Yet, existing datasets are inadequate to develop such perception systems, as they often focus on urban settings or a limited number of species. To address this, we present SilvaScenes, a new dataset for instance segmentation of tree species from under-canopy images. Collected across five bioclimatic domains in Quebec, Canada, SilvaScenes features 1476 trees from 24 species with annotations from forestry experts. We demonstrate the relevance and challenging nature of our dataset by benchmarking modern deep learning approaches for instance segmentation. Our results show that, while tree segmentation is easy, with a top mean average precision (mAP) of 67.65%, species classification remains a significant challenge with an mAP of only 35.69%. Our dataset and source code will be available at https://github.com/norlab-ulaval/SilvaScenes.
ROJun 19, 2025Code
Reproducible Evaluation of Camera Auto-Exposure Methods in the Field: Platform, Benchmark and Lessons LearnedOlivier Gamache, Jean-Michel Fortin, Matěj Boxan et al.
Standard datasets often present limitations, particularly due to the fixed nature of input data sensors, which makes it difficult to compare methods that actively adjust sensor parameters to suit environmental conditions. This is the case with Automatic-Exposure (AE) methods, which rely on environmental factors to influence the image acquisition process. As a result, AE methods have traditionally been benchmarked in an online manner, rendering experiments non-reproducible. Building on our prior work, we propose a methodology that utilizes an emulator capable of generating images at any exposure time. This approach leverages BorealHDR, a unique multi-exposure stereo dataset, along with its new extension, in which data was acquired along a repeated trajectory at different times of the day to assess the impact of changing illumination. In total, BorealHDR covers 13.4 km over 59 trajectories in challenging lighting conditions. The dataset also includes lidar-inertial-odometry-based maps with pose estimation for each image frame, as well as Global Navigation Satellite System (GNSS) data for comparison. We demonstrate that by using images acquired at various exposure times, we can emulate realistic images with a Root-Mean-Square Error (RMSE) below 1.78% compared to ground truth images. Using this offline approach, we benchmarked eight AE methods, concluding that the classical AE method remains the field's best performer. To further support reproducibility, we provide in-depth details on the development of our backpack acquisition platform, including hardware, electrical components, and performance specifications. Additionally, we share valuable lessons learned from deploying the backpack over more than 25 km across various environments. Our code and dataset are available online at this link: https://github.com/norlab-ulaval/TFR24 BorealHDR
ROMay 6, 2019Code
ReFusion: 3D Reconstruction in Dynamic Environments for RGB-D Cameras Exploiting ResidualsEmanuele Palazzolo, Jens Behley, Philipp Lottes et al.
Mapping and localization are essential capabilities of robotic systems. Although the majority of mapping systems focus on static environments, the deployment in real-world situations requires them to handle dynamic objects. In this paper, we propose an approach for an RGB-D sensor that is able to consistently map scenes containing multiple dynamic elements. For localization and mapping, we employ an efficient direct tracking on the truncated signed distance function (TSDF) and leverage color information encoded in the TSDF to estimate the pose of the sensor. The TSDF is efficiently represented using voxel hashing, with most computations parallelized on a GPU. For detecting dynamics, we exploit the residuals obtained after an initial registration, together with the explicit modeling of free space in the model. We evaluate our approach on existing datasets, and provide a new dataset showing highly dynamic scenes. These experiments show that our approach often surpass other state-of-the-art dense SLAM methods. We make available our dataset with the ground truth for both the trajectory of the RGB-D sensor obtained by a motion capture system and the model of the static environment using a high-precision terrestrial laser scanner. Finally, we release our approach as open source code.
RONov 27, 2021
Kilometer-scale autonomous navigation in subarctic forests: challenges and lessons learnedDominic Baril, Simon-Pierre Deschênes, Olivier Gamache et al.
Challenges inherent to autonomous wintertime navigation in forests include lack of reliable a Global Navigation Satellite System (GNSS) signal, low feature contrast, high illumination variations and changing environment. This type of off-road environment is an extreme case of situations autonomous cars could encounter in northern regions. Thus, it is important to understand the impact of this harsh environment on autonomous navigation systems. To this end, we present a field report analyzing teach-and-repeat navigation in a subarctic forest while subject to fluctuating weather, including light and heavy snow, rain and drizzle. First, we describe the system, which relies on point cloud registration to localize a mobile robot through a boreal forest, while simultaneously building a map. We experimentally evaluate this system in over 18.8 km of autonomous navigation in the teach-and-repeat mode. Over 14 repeat runs, only four manual interventions were required, three of which were due to localization failure and another one caused by battery power outage. We show that dense vegetation perturbs the GNSS signal, rendering it unsuitable for navigation in forest trails. Furthermore, we highlight the increased uncertainty related to localizing using point cloud registration in forest trails. We demonstrate that it is not snow precipitation, but snow accumulation, that affects our system's ability to localize within the environment. Finally, we expose some challenges and lessons learned from our field campaign to support better experimental work in winter conditions. Our dataset is available online.
ROMay 24, 2021
SuMa++: Efficient LiDAR-based Semantic SLAMXieyuanli Chen, Andres Milioto, Emanuele Palazzolo et al.
Reliable and accurate localization and mapping are key components of most autonomous systems. Besides geometric information about the mapped environment, the semantics plays an important role to enable intelligent navigation behaviors. In most realistic environments, this task is particularly complicated due to dynamics caused by moving objects, which can corrupt the mapping step or derail localization. In this paper, we propose an extension of a recently published surfel-based mapping approach exploiting three-dimensional laser range scans by integrating semantic information to facilitate the mapping process. The semantic information is efficiently extracted by a fully convolutional neural network and rendered on a spherical projection of the laser range data. This computed semantic segmentation results in point-wise labels for the whole scan, allowing us to build a semantically-enriched map with labeled surfels. This semantic map enables us to reliably filter moving objects, but also improve the projective scan matching via semantic constraints. Our experimental evaluation on challenging highways sequences from KITTI dataset with very few static structures and a large amount of moving cars shows the advantage of our semantic SLAM approach in comparison to a purely geometric, state-of-the-art approach.
ROMay 3, 2021
Lidar Scan Registration Robust to Extreme MotionsSimon-Pierre Deschênes, Dominic Baril, Vladimír Kubelka et al.
Registration algorithms, such as Iterative Closest Point (ICP), have proven effective in mobile robot localization algorithms over the last decades. However, they are susceptible to failure when a robot sustains extreme velocities and accelerations. For example, this kind of motion can happen after a collision, causing a point cloud to be heavily skewed. While point cloud de-skewing methods have been explored in the past to increase localization and mapping accuracy, these methods still rely on highly accurate odometry systems or ideal navigation conditions. In this paper, we present a method taking into account the remaining motion uncertainties of the trajectory used to de-skew a point cloud along with the environment geometry to increase the robustness of current registration algorithms. We compare our method to three other solutions in a test bench producing 3D maps with peak accelerations of 200 m/s^2 and 800 rad/s^2. In these extreme scenarios, we demonstrate that our method decreases the error by 9.26 % in translation and by 21.84 % in rotation. The proposed method is generic enough to be integrated to many variants of weighted ICP without adaptation and supports localization robustness in harsher terrains.
ROApr 29, 2021
Accurate outdoor ground truth based on total stationsMaxime Vaidis, Philippe Giguère, François Pomerleau et al.
In robotics, accurate ground-truth position fostered the development of mapping and localization algorithms through the creation of cornerstone datasets. In outdoor environments and over long distances, total stations are the most accurate and precise measurement instruments for this purpose. Most total station-based systems in the literature are limited to three Degrees Of Freedoms (DOFs), due to the use of a single-prism tracking approach. In this paper, we present preliminary work on measuring a full pose of a vehicle, bringing the referencing system to six DOFs. Three total stations are used to track in real time three prisms attached to a target platform. We describe the structure of the referencing system and the protocol for acquiring the ground truth with this system. We evaluated its precision in a variety of different outdoor environments, ranging from open-sky to forest trails, and compare this system with another popular source of reference position, the Real Time Kinematics (RTK) positioning solution. Results show that our approach is the most precise, reaching an average positional error of 10 mm and 0.6 deg. This difference in performance was particularly stark in environments where Global Navigation Satellite System (GNSS) signals can be weaker due to overreaching vegetation.
ROApr 10, 2020
Evaluation of Skid-Steering Kinematic Models for Subarctic EnvironmentsDominic Baril, Vincent Grondin, Simon-Pierre Deschênes et al.
In subarctic and arctic areas, large and heavy skid-steered robots are preferred for their robustness and ability to operate on difficult terrain. State estimation, motion control and path planning for these robots rely on accurate odometry models based on wheel velocities. However, the state-of-the-art odometry models for skid-steer mobile robots (SSMRs) have usually been tested on relatively lightweight platforms. In this paper, we focus on how these models perform when deployed on a large and heavy (590 kg) SSMR. We collected more than 2 km of data on both snow and concrete. We compare the ideal differential-drive, extended differential-drive, radius-of-curvature-based, and full linear kinematic models commonly deployed for SSMRs. Each of the models is fine-tuned by searching their optimal parameters on both snow and concrete. We then discuss the relationship between the parameters, the model tuning, and the final accuracy of the models.
LGJan 29, 2020
The Indian Chefs ProcessPatrick Dallaire, Luca Ambrogioni, Ludovic Trottier et al.
This paper introduces the Indian Chefs Process (ICP), a Bayesian nonparametric prior on the joint space of infinite directed acyclic graphs (DAGs) and orders that generalizes Indian Buffet Processes. As our construction shows, the proposed distribution relies on a latent Beta Process controlling both the orders and outgoing connection probabilities of the nodes, and yields a probability distribution on sparse infinite graphs. The main advantage of the ICP over previously proposed Bayesian nonparametric priors for DAG structures is its greater flexibility. To the best of our knowledge, the ICP is the first Bayesian nonparametric model supporting every possible DAG. We demonstrate the usefulness of the ICP on learning the structure of deep generative sigmoid networks as well as convolutional neural networks.
LGDec 6, 2019
Tree bark re-identification using a deep-learning feature descriptorMartin Robert, Patrick Dallaire, Philippe Giguère
The ability to visually re-identify objects is a fundamental capability in vision systems. Oftentimes, it relies on collections of visual signatures based on descriptors, such as SIFT or SURF. However, these traditional descriptors were designed for a certain domain of surface appearances and geometries (limited relief). Consequently, highly-textured surfaces such as tree bark pose a challenge to them. In turn, this makes it more difficult to use trees as identifiable landmarks for navigational purposes (robotics) or to track felled lumber along a supply chain (logistics). We thus propose to use data-driven descriptors trained on bark images for tree surface re-identification. To this effect, we collected a large dataset containing 2,400 bark images with strong illumination changes, annotated by surface and with the ability to pixel-align them. We used this dataset to sample from more than 2 million 64x64 pixel patches to train our novel local descriptors DeepBark and SqueezeBark. Our DeepBark method has shown a clear advantage against the hand-crafted descriptors SIFT and SURF. For instance, we demonstrated that DeepBark can reach a mAP of 87.2% when retrieving 11 relevant bark images, i.e. corresponding to the same physical surface, to a bark query against 7,900 images. Our work thus suggests that re-identifying tree surfaces in a challenging illuminations context is possible. We also make public our dataset, which can be used to benchmark surface re-identification techniques.
CVNov 26, 2019
Deep Template-based Object Instance DetectionJean-Philippe Mercier, Mathieu Garon, Philippe Giguère et al.
Much of the focus in the object detection literature has been on the problem of identifying the bounding box of a particular class of object in an image. Yet, in contexts such as robotics and augmented reality, it is often necessary to find a specific object instance---a unique toy or a custom industrial part for example---rather than a generic object class. Here, applications can require a rapid shift from one object instance to another, thus requiring fast turnaround which affords little-to-no training time. What is more, gathering a dataset and training a model for every new object instance to be detected can be an expensive and time-consuming process. In this context, we propose a generic 2D object instance detection approach that uses example viewpoints of the target object at test time to retrieve its 2D location in RGB images, without requiring any additional training (i.e. fine-tuning) step. To this end, we present an end-to-end architecture that extracts global and local information of the object from its viewpoints. The global information is used to tune early filters in the backbone while local viewpoints are correlated with the input image. Our method offers an improvement of almost 30 mAP over the previous template matching methods on the challenging Occluded Linemod dataset (overall mAP of 50.7). Our experiments also show that our single generic model (not trained on any of the test objects) yields detection results that are on par with approaches that are trained specifically on the target objects.
ROOct 26, 2019
Driving Datasets Literature ReviewCharles-Éric Noël Laflamme, François Pomerleau, Philippe Giguère
This report is a survey of the different autonomous driving datasets which have been published up to date. The first section introduces the many sensor types used in autonomous driving datasets. The second section investigates the calibration and synchronization procedure required to generate accurate data. The third section describes the diverse driving tasks explored by the datasets. Finally, the fourth section provides comprehensive lists of datasets, mainly in the form of tables.
ROApr 16, 2019
Predicting GNSS satellite visibility from dense point cloudsPhilippe Dandurand, Philippe Babin, Vladimır Kubelka et al.
To help future mobile agents plan their movement in harsh environments,a predictive model has been designed to determine what areas would be favorable for Global Navigation Satellite System (GNSS) positioning. The model is able to predict the number of viable satellites for a GNSS receiver, based on a 3D point cloud map and a satellite constellation. Both occlusion and absorption effects of the environment are considered. A rugged mobile platform was designed to collect data in order to generate the point cloud maps. It was deployed during the Canadian winter known for large amounts of snow and extremely low temperatures. The test environments include a highly dense boreal forest and a university campus with high buildings. The experiment results indicate that the model performs well in both structured and unstructured environments
ROApr 16, 2019
Large-scale 3D Mapping of Subarctic ForestsPhilippe Babin, Philippe Dandurand, Vladimír Kubelka et al.
The ability to map challenging subarctic environments opens new horizons for robotic deployments in industries such as forestry, surveillance, and open-pit mining. In this paper, we explore possibilities of large-scale lidar mapping in a boreal forest. Computational and sensory requirements with regards to contemporary hardware are considered as well. The lidar mapping is often based on the SLAM technique relying on pose graph optimization, which fuses the Iterative Closest Point (ICP) algorithm, Global Navigation Satellite System (GNSS) positioning, and Inertial Measurement Unit (IMU) measurements. To handle those sensors directly within the ICP minimization process, we propose an alternative approach of embedding external constraints. Furthermore, a novel formulation of a cost function is presented and cast into the problem of handling uncertainties from GNSS and lidar points. To test our approach, we acquired a large-scale dataset in the Foret Montmorency research forest. We report on the technical problems faced during our winter deployments aiming at building 3D maps using our new cost function. Those maps demonstrate both global and local consistency over 4.1km.
ROApr 10, 2019
Automatic 3D Mapping for Tree Diameter Measurements in Inventory OperationsJean-François Tremblay, Martin Béland, François Pomerleau et al.
Forestry is a major industry in many parts of the world. It relies on forest inventory, which consists of measuring tree attributes. We propose to use 3D mapping, based on the iterative closest point algorithm, to automatically measure tree diameters in forests from mobile robot observations. While previous studies showed the potential for such technology, they lacked a rigorous analysis of diameter estimation methods in challenging forest environments. Here, we validated multiple diameter estimation methods, including two novel ones, in a new varied dataset of four different forest sites, 11 trajectories, totaling 1458 tree observations and 1.4 hectares. We provide recommendations for the deployment of mobile robots in a forestry context. We conclude that our mapping method is usable in the context of automated forest inventory, with our best method yielding a root mean square error of 3.45 cm for our whole dataset, and 2.04 cm in ideal conditions consisting of mature forest with well spaced trees.
ROMar 6, 2019
GQ-STN: Optimizing One-Shot Grasp Detection based on Robustness ClassifierAlexandre Gariépy, Jean-Christophe Ruel, Brahim Chaib-draa et al.
Grasping is a fundamental robotic task needed for the deployment of household robots or furthering warehouse automation. However, few approaches are able to perform grasp detection in real time (frame rate). To this effect, we present Grasp Quality Spatial Transformer Network (GQ-STN), a one-shot grasp detection network. Being based on the Spatial Transformer Network (STN), it produces not only a grasp configuration, but also directly outputs a depth image centered at this configuration. By connecting our architecture to an externally-trained grasp robustness evaluation network, we can train efficiently to satisfy a robustness metric via the backpropagation of the gradient emanating from the evaluation network. This removes the difficulty of training detection networks on sparsely annotated databases, a common issue in grasping. We further propose to use this robustness classifier to compare approaches, being more reliable than the traditional rectangle metric. Our GQ-STN is able to detect robust grasps on the depth images of the Dex-Net 2.0 dataset with 92.4 % accuracy in a single pass of the network. We finally demonstrate in a physical benchmark that our method can propose robust grasps more often than previous sampling-based methods, while being more than 60 times faster.
ROOct 2, 2018
Analysis of Robust Functions for Registration AlgorithmsPhilippe Babin, Philippe Giguère, François Pomerleau
Registration accuracy is influenced by the presence of outliers and numerous robust solutions have been developed over the years to mitigate their effect. However, without a large scale comparison of solutions to filter outliers, it is becoming tedious to select an appropriate algorithm for a given application. This paper presents a comprehensive analyses of the effects of outlier filters on the ICP algorithm aimed at mobile robotic application. Fourteen of the most common outlier filters (such as M-estimators) have been tested in different types of environments, for a total of more than two million registrations. Furthermore, the influence of tuning parameters have been thoroughly explored. The experimental results show that most outlier filters have similar performance if they are correctly tuned. Nonetheless, filters such as Var. Trim., Cauchy, and Cauchy MAD are more stable against different environment types. Interestingly, the simple norm L1 produces comparable accuracy, while been parameterless.
ROOct 2, 2018
CELLO-3D: Estimating the Covariance of ICP in the Real WorldDavid Landry, François Pomerleau, Philippe Giguère
The fusion of Iterative Closest Point (ICP) reg- istrations in existing state estimation frameworks relies on an accurate estimation of their uncertainty. In this paper, we study the estimation of this uncertainty in the form of a covariance. First, we scrutinize the limitations of existing closed-form covariance estimation algorithms over 3D datasets. Then, we set out to estimate the covariance of ICP registrations through a data-driven approach, with over 5 100 000 registrations on 1020 pairs from real 3D point clouds. We assess our solution upon a wide spectrum of environments, ranging from structured to unstructured and indoor to outdoor. The capacity of our algorithm to predict covariances is accurately assessed, as well as the usefulness of these estimations for uncertainty estimation over trajectories. The proposed method estimates covariances better than existing closed-form solutions, and makes predictions that are consistent with observed trajectories.
CVJun 18, 2018
Learning Object Localization and 6D Pose Estimation from Simulation and Weakly Labeled Real ImagesJean-Philippe Mercier, Chaitanya Mitash, Philippe Giguère et al.
This work proposes a process for efficiently training a point-wise object detector that enables localizing objects and computing their 6D poses in cluttered and occluded scenes. Accurate pose estimation is typically a requirement for robust robotic grasping and manipulation of objects placed in cluttered, tight environments, such as a shelf with multiple objects. To minimize the human labor required for annotation, the proposed object detector is first trained in simulation by using automatically annotated synthetic images. We then show that the performance of the detector can be substantially improved by using a small set of weakly annotated real images, where a human provides only a list of objects present in each image without indicating the location of the objects. To close the gap between real and synthetic images, we adopt a domain adaptation approach through adversarial training. The detector resulting from this training process can be used to localize objects by using its per-object activation maps. In this work, we use the activation maps to guide the search of 6D poses of objects. Our proposed approach is evaluated on several publicly available datasets for pose estimation. We also evaluated our model on classification and localization in unsupervised and semi-supervised settings. The results clearly indicate that this approach could provide an efficient way toward fully automating the training process of computer vision models used in robotics.
CVMar 2, 2018
Tree Species Identification from Bark Images Using Convolutional Neural NetworksMathieu Carpentier, Philippe Giguère, Jonathan Gaudreault
Tree species identification using bark images is a challenging problem that could prove useful for many forestry related tasks. However, while the recent progress in deep learning showed impressive results on standard vision problems, a lack of datasets prevented its use on tree bark species classification. In this work, we present, and make publicly available, a novel dataset called BarkNet 1.0 containing more than 23,000 high-resolution bark images from 23 different tree species over a wide range of tree diameters. With it, we demonstrate the feasibility of species recognition through bark images, using deep learning. More specifically, we obtain an accuracy of 93.88% on single crop, and an accuracy of 97.81% using a majority voting approach on all of the images of a tree. We also empirically demonstrate that, for a fixed number of images, it is better to maximize the number of tree individuals in the training database, thus directing future data collection efforts.
CVOct 28, 2017
Multi-Task Learning by Deep Collaboration and Application in Facial Landmark DetectionLudovic Trottier, Philippe Giguère, Brahim Chaib-draa
Convolutional neural networks (CNNs) have become the most successful approach in many vision-related domains. However, they are limited to domains where data is abundant. Recent works have looked at multi-task learning (MTL) to mitigate data scarcity by leveraging domain-specific information from related tasks. In this paper, we present a novel soft-parameter sharing mechanism for CNNs in a MTL setting, which we refer to as Deep Collaboration. We propose taking into account the notion that task relevance depends on depth by using lateral transformation blocs with skip connections. This allows extracting task-specific features at various depth without sacrificing features relevant to all tasks. We show that CNNs connected with our Deep Collaboration obtain better accuracy on facial landmark detection with related tasks. We finally verify that our approach effectively allows knowledge sharing by showing depth-specific influence of tasks that we know are related.
ROJun 2, 2016
Dictionary Learning for Robotic Grasp Recognition and DetectionLudovic Trottier, Philippe Giguère, Brahim Chaib-draa
The ability to grasp ordinary and potentially never-seen objects is an important feature in both domestic and industrial robotics. For a system to accomplish this, it must autonomously identify grasping locations by using information from various sensors, such as Microsoft Kinect 3D camera. Despite numerous progress, significant work still remains to be done in this field. To this effect, we propose a dictionary learning and sparse representation (DLSR) framework for representing RGBD images from 3D sensors in the context of determining such good grasping locations. In contrast to previously proposed approaches that relied on sophisticated regularization or very large datasets, the derived perception system has a fast training phase and can work with small datasets. It is also theoretically founded for dealing with masked-out entries, which are common with 3D sensors. We contribute by presenting a comparative study of several DLSR approach combinations for recognizing and detecting grasp candidates on the standard Cornell dataset. Importantly, experimental results show a performance improvement of 1.69% in detection and 3.16% in recognition over current state-of-the-art convolutional neural network (CNN). Even though nowadays most popular vision-based approach is CNN, this suggests that DLSR is also a viable alternative with interesting advantages that CNN has not.
LGMay 30, 2016
Parametric Exponential Linear Unit for Deep Convolutional Neural NetworksLudovic Trottier, Philippe Giguère, Brahim Chaib-draa
Object recognition is an important task for improving the ability of visual systems to perform complex scene understanding. Recently, the Exponential Linear Unit (ELU) has been proposed as a key component for managing bias shift in Convolutional Neural Networks (CNNs), but defines a parameter that must be set by hand. In this paper, we propose learning a parameterization of ELU in order to learn the proper activation shape at each layer in the CNNs. Our results on the MNIST, CIFAR-10/100 and ImageNet datasets using the NiN, Overfeat, All-CNN and ResNet networks indicate that our proposed Parametric ELU (PELU) has better performances than the non-parametric ELU. We have observed as much as a 7.28% relative error improvement on ImageNet with the NiN network, with only 0.0003% parameter increase. Our visual examination of the non-linear behaviors adopted by Vgg using PELU shows that the network took advantage of the added flexibility by learning different activations at different layers.
CVMar 19, 2015
Sign Language Fingerspelling Classification from Depth and Color Images using a Deep Belief NetworkLucas Rioux-Maldague, Philippe Giguère
Automatic sign language recognition is an open problem that has received a lot of attention recently, not only because of its usefulness to signers, but also due to the numerous applications a sign classifier can have. In this article, we present a new feature extraction technique for hand pose recognition using depth and intensity images captured from a Microsoft Kinect sensor. We applied our technique to American Sign Language fingerspelling classification using a Deep Belief Network, for which our feature extraction technique is tailored. We evaluated our results on a multi-user data set with two scenarios: one with all known users and one with an unseen user. We achieved 99% recall and precision on the first, and 77% recall and 79% precision on the second. Our method is also capable of real-time sign classification and is adaptive to any environment or lightning intensity.