CVDec 19, 2022
Leveraging Road Area Semantic Segmentation with Auxiliary Steering TaskJyri Maanpää, Iaroslav Melekhov, Josef Taher et al.
Robustness of different pattern recognition methods is one of the key challenges in autonomous driving, especially when driving in the high variety of road environments and weather conditions, such as gravel roads and snowfall. Although one can collect data from these adverse conditions using cars equipped with sensors, it is quite tedious to annotate the data for training. In this work, we address this limitation and propose a CNN-based method that can leverage the steering wheel angle information to improve the road area semantic segmentation. As the steering wheel angle data can be easily acquired with the associated images, one could improve the accuracy of road area semantic segmentation by collecting data in new road environments without manual data annotation. We demonstrate the effectiveness of the proposed approach on two challenging data sets for autonomous driving and show that when the steering task is used in our segmentation model training, it leads to a 0.1-2.9% gain in the road area mIoU (mean Intersection over Union) compared to the corresponding reference transfer learning model.
CVNov 1, 2025
Benchmarking individual tree segmentation using multispectral airborne laser scanning data: the FGI-EMIT datasetLassi Ruoppa, Tarmo Hietala, Verneri Seppänen et al.
Individual tree segmentation (ITS) from LiDAR point clouds is fundamental for applications such as forest inventory, carbon monitoring and biodiversity assessment. Traditionally, ITS has been achieved with unsupervised geometry-based algorithms, while more recent advances have shifted toward supervised deep learning (DL). In the past, progress in method development was hindered by the lack of large-scale benchmark datasets, and the availability of novel data formats, particularly multispectral (MS) LiDAR, remains limited to this day, despite evidence that MS reflectance can improve the accuracy of ITS. This study introduces FGI-EMIT, the first large-scale MS airborne laser scanning benchmark dataset for ITS. Captured at wavelengths 532, 905, and 1,550 nm, the dataset consists of 1,561 manually annotated trees, with a particular focus on small understory trees. Using FGI-EMIT, we comprehensively benchmarked four conventional unsupervised algorithms and four supervised DL approaches. Hyperparameters of unsupervised methods were optimized using a Bayesian approach, while DL models were trained from scratch. Among the unsupervised methods, Treeiso achieved the highest test set F1-score of 52.7%. The DL approaches performed significantly better overall, with the best model, ForestFormer3D, attaining an F1-score of 73.3%. The most significant difference was observed in understory trees, where ForestFormer3D exceeded Treeiso by 25.9 percentage points. An ablation study demonstrated that current DL-based approaches generally fail to leverage MS reflectance information when it is provided as additional input features, although single channel reflectance can improve accuracy marginally, especially for understory trees. A performance analysis across point densities further showed that DL methods consistently remain superior to unsupervised algorithms, even at densities as low as 10 points/m$^2$.
CVMay 21
3D LULC classification using multispectral LiDAR and deep learning: current and prospective schemesNarges Takhtkeshha, Aldino Rizaldy, Markus Hollaus et al.
Land Use Land Cover (LULC) classification is essential for national 3D mapping, geospatial analysis, and sustainable planning. Multispectral (MS) LiDAR provides synchronized spatial-spectral information, and deep learning (DL) enables 3D point cloud semantic segmentation; however, adoption is limited by the lack of publicly available urban and suburban MS LiDAR datasets aligned with National Mapping and Cadastral Agencies (NMCAs) classification schemes. This study addresses these gaps by introducing L1 and L2 NMCA-aligned LULC classification schemes and a new benchmark MS LiDAR dataset. We evaluate seven state-of-the-art DL models and perform spectral ablation studies at both levels of detail. Results show that Point Transformer V3 achieves the best performance, with mIoU of 79.4% (L1, 8 classes) and 58.9% (L2, 20 classes) using a dual-wavelength LiDAR system (532 nm and 1064 nm). Ablation results show that multispectral information improves performance over geometry-only inputs, with gains of 1.1 percentage points at L1 and 7.8 points at L2. These results highlight the value of LiDAR reflectance for fine-grained material discrimination and support the evolution of NMCA LULC schemes toward higher semantic detail. The Loosdorf-MSL dataset contributes a new benchmark for consistent national and international LULC mapping.
CVApr 27
Multispectral airborne laser scanning dataset for tree species classification: MS-ALS-SPECIESMatti Hyyppä, Klaara Salolahti, Eric Hyyppä et al.
The shift from stand-level to individual-tree-level forest assessments supports improved biodiversity mapping, particularly in boreal ecosystems where tree species like aspen (Populus tremula L.) play a keystone role. While airborne laser scanning (ALS) is the standard for such inventories, a major limitation is the small number of publicly available ALS datasets containing high-quality, field-validated reference data. Furthermore, open multispectral ALS datasets with high-quality field reference data are completely lacking despite the potential of multispectral ALS data for tree species classification. This paper presents and details an open multispectral ALS dataset used in a recent international benchmarking study of machine learning and deep learning methods for tree species classification by Taher et al. (2026). The dataset comprises 6326 segment-level point clouds of individual trees representing nine species in Southern Finland. The point cloud data has been acquired using two multispectral laser scanning systems each operating at three laser wavelengths: a helicopter-borne system (HeliALS) with a point density exceeding 1000 points/m$^2$ and an Optech Titan system with approximately 35 points/m$^2$. We provide a detailed description of field data collection techniques developed in the study to facilitate the collection of high-quality ground truth data in an efficient and scalable manner. Additionally, our article presents new analyses on species classification using multispectral data building upon the initial findings of Taher et al. (2026). Furthermore, we study the relation between classification accuracy and tree height to highlight the versatility of the open dataset and to demonstrate the advantage of the point transformer model for small trees and minority species.
CVFeb 10, 2025
Unsupervised deep learning for semantic segmentation of multispectral LiDAR forest point cloudsLassi Ruoppa, Oona Oinonen, Josef Taher et al.
Point clouds captured with laser scanning systems from forest environments can be utilized in a wide variety of applications within forestry and plant ecology, such as the estimation of tree stem attributes, leaf angle distribution, and above-ground biomass. However, effectively utilizing the data in such tasks requires the semantic segmentation of the data into wood and foliage points, also known as leaf-wood separation. The traditional approach to leaf-wood separation has been geometry- and radiometry-based unsupervised algorithms, which tend to perform poorly on data captured with airborne laser scanning (ALS) systems, even with a high point density. While recent machine and deep learning approaches achieve great results even on sparse point clouds, they require manually labeled training data, which is often extremely laborious to produce. Multispectral (MS) information has been demonstrated to have potential for improving the accuracy of leaf-wood separation, but quantitative assessment of its effects has been lacking. This study proposes a fully unsupervised deep learning method, GrowSP-ForMS, which is specifically designed for leaf-wood separation of high-density MS ALS point clouds and based on the GrowSP architecture. GrowSP-ForMS achieved a mean accuracy of 84.3% and a mean intersection over union (mIoU) of 69.6% on our MS test set, outperforming the unsupervised reference methods by a significant margin. When compared to supervised deep learning methods, our model performed similarly to the slightly older PointNet architecture but was outclassed by more recent approaches. Finally, two ablation studies were conducted, which demonstrated that our proposed changes increased the test set mIoU of GrowSP-ForMS by 29.4 percentage points (pp) in comparison to the original GrowSP model and that utilizing MS data improved the mIoU by 5.6 pp from the monospectral case.
CVOct 24, 2024
Unsupervised semantic segmentation of urban high-density multispectral point cloudsOona Oinonen, Lassi Ruoppa, Josef Taher et al.
The availability of highly accurate urban airborne laser scanning (ALS) data will increase rapidly in the future, especially as acquisition costs decrease, for example through the use of drones. Current challenges in data processing are related to the limited spectral information and low point density of most ALS datasets. Another challenge will be the growing need for annotated training data, frequently produced by manual processes, to enable semantic interpretation of point clouds. This study proposes to semantically segment new high-density (1200 points per square metre on average) multispectral ALS data with an unsupervised ground-aware deep clustering method GroupSP inspired by the unsupervised GrowSP algorithm. GroupSP divides the scene into superpoints as a preprocessing step. The neural network is trained iteratively by grouping the superpoints and using the grouping assignments as pseudo-labels. The predictions for the unseen data are given by over-segmenting the test set and mapping the predicted classes into ground truth classes manually or with automated majority voting. GroupSP obtained an overall accuracy (oAcc) of 97% and a mean intersection over union (mIoU) of 80%. When compared to other unsupervised semantic segmentation methods, GroupSP outperformed GrowSP and non-deep K-means. However, a supervised random forest classifier outperformed GroupSP. The labelling efforts in GroupSP can be minimal; it was shown, that the GroupSP can semantically segment seven urban classes (building, high vegetation, low vegetation, asphalt, rock, football field, and gravel) with oAcc of 95% and mIoU of 75% using only 0.004% of the available annotated points in the mapping assignment. Finally, the multispectral information was examined; adding each new spectral channel improved the mIoU. Additionally, echo deviation was valuable, especially when distinguishing ground-level classes.
IVJul 8, 2025
3D forest semantic segmentation using multispectral LiDAR and 3D deep learningNarges Takhtkeshha, Lauris Bocaux, Lassi Ruoppa et al.
Conservation and decision-making regarding forest resources necessitate regular forest inventory. Light detection and ranging (LiDAR) in laser scanning systems has gained significant attention over the past two decades as a remote and non-destructive solution to streamline the labor-intensive and time-consuming procedure of forest inventory. Advanced multispectral (MS) LiDAR systems simultaneously acquire three-dimensional (3D) spatial and spectral information across multiple wavelengths of the electromagnetic spectrum. Consequently, MS-LiDAR technology enables the estimation of both the biochemical and biophysical characteristics of forests. Forest component segmentation is crucial for forest inventory. The synergistic use of spatial and spectral laser information has proven to be beneficial for achieving precise forest semantic segmentation. Thus, this study aims to investigate the potential of MS-LiDAR data, captured by the HeliALS system, providing high-density multispectral point clouds to segment forests into six components: ground, low vegetation, trunks, branches, foliage, and woody debris. Three point-wise 3D deep learning models and one machine learning model, including kernel point convolution, superpoint transformer, point transformer V3, and random forest, are implemented. Our experiments confirm the superior accuracy of the KPConv model. Additionally, various geometric and spectral feature vector scenarios are examined. The highest accuracy is achieved by feeding all three wavelengths (1550 nm, 905 nm, and 532 nm) as the initial features into the deep learning model, resulting in improvements of 33.73% and 32.35% in mean intersection over union (mIoU) and in mean accuracy (mAcc), respectively. This study highlights the excellent potential of multispectral LiDAR for improving the accuracy in fully automated forest component segmentation.
CVApr 19, 2025
Multispectral airborne laser scanning for tree species classification: a benchmark of machine learning and deep learning algorithmsJosef Taher, Eric Hyyppä, Matti Hyyppä et al.
Climate-smart and biodiversity-preserving forestry demands precise information on forest resources, extending to the individual tree level. Multispectral airborne laser scanning (ALS) has shown promise in automated point cloud processing and tree segmentation, but challenges remain in identifying rare tree species and leveraging deep learning techniques. This study addresses these gaps by conducting a comprehensive benchmark of machine learning and deep learning methods for tree species classification. For the study, we collected high-density multispectral ALS data (>1000 pts/m$^2$) at three wavelengths using the FGI-developed HeliALS system, complemented by existing Optech Titan data (35 pts/m$^2$), to evaluate the species classification accuracy of various algorithms in a test site located in Southern Finland. Based on 5261 test segments, our findings demonstrate that point-based deep learning methods, particularly a point transformer model, outperformed traditional machine learning and image-based deep learning approaches on high-density multispectral point clouds. For the high-density ALS dataset, a point transformer model provided the best performance reaching an overall (macro-average) accuracy of 87.9% (74.5%) with a training set of 1065 segments and 92.0% (85.1%) with 5000 training segments. The best image-based deep learning method, DetailView, reached an overall (macro-average) accuracy of 84.3% (63.9%), whereas a random forest (RF) classifier achieved an overall (macro-average) accuracy of 83.2% (61.3%). Importantly, the overall classification accuracy of the point transformer model on the HeliALS data increased from 73.0% with no spectral information to 84.7% with single-channel reflectance, and to 87.9% with spectral information of all the three channels.
CVApr 26, 2024
Dense Road Surface Grip Map Prediction from Multimodal Image DataJyri Maanpää, Julius Pesonen, Heikki Hyyti et al.
Slippery road weather conditions are prevalent in many regions and cause a regular risk for traffic. Still, there has been less research on how autonomous vehicles could detect slippery driving conditions on the road to drive safely. In this work, we propose a method to predict a dense grip map from the area in front of the car, based on postprocessed multimodal sensor data. We trained a convolutional neural network to predict pixelwise grip values from fused RGB camera, thermal camera, and LiDAR reflectance images, based on weakly supervised ground truth from an optical road weather sensor. The experiments show that it is possible to predict dense grip values with good accuracy from the used data modalities as the produced grip map follows both ground truth measurements and local weather conditions, such as snowy areas on the road. The model using only the RGB camera or LiDAR reflectance modality provided good baseline results for grip prediction accuracy while using models fusing the RGB camera, thermal camera, and LiDAR modalities improved the grip predictions significantly.
CVDec 5, 2025
NormalView: sensor-agnostic tree species classification from backpack and aerial lidar data using geometric projectionsJuho Korkeala, Jesse Muhojoki, Josef Taher et al.
Laser scanning has proven to be an invaluable tool in assessing the decomposition of forest environments. Mobile laser scanning (MLS) has shown to be highly promising for extremely accurate, tree level inventory. In this study, we present NormalView, a sensor-agnostic projection-based deep learning method for classifying tree species from point cloud data. NormalView embeds local geometric information into two-dimensional projections, in the form of normal vector estimates, and uses the projections as inputs to an image classification network, YOLOv11. In addition, we inspected the effect of multispectral radiometric intensity information on classification performance. We trained and tested our model on high-density MLS data (7 species, ~5000 pts/m^2), as well as high-density airborne laser scanning (ALS) data (9 species, >1000 pts/m^2). On the MLS data, NormalView achieves an overall accuracy (macro-average accuracy) of 95.5 % (94.8 %), and 91.8 % (79.1 %) on the ALS data. We found that having intensity information from multiple scanners provides benefits in tree species classification, and the best model on the multispectral ALS dataset was a model using intensity information from all three channels of the multispectral ALS. This study demonstrates that projection-based methods, when enhanced with geometric information and coupled with state-of-the-art image classification backbones, can achieve exceptional results. Crucially, these methods are sensor-agnostic, relying only on geometric information. Additionally, we publically release the MLS dataset used in the study.
CVAug 27, 2025
Multispectral LiDAR data for extracting tree points in urban and suburban areasNarges Takhtkeshha, Gabriele Mazzacca, Fabio Remondino et al.
Monitoring urban tree dynamics is vital for supporting greening policies and reducing risks to electrical infrastructure. Airborne laser scanning has advanced large-scale tree management, but challenges remain due to complex urban environments and tree variability. Multispectral (MS) light detection and ranging (LiDAR) improves this by capturing both 3D spatial and spectral data, enabling detailed mapping. This study explores tree point extraction using MS-LiDAR and deep learning (DL) models. Three state-of-the-art models are evaluated: Superpoint Transformer (SPT), Point Transformer V3 (PTv3), and Point Transformer V1 (PTv1). Results show the notable time efficiency and accuracy of SPT, with a mean intersection over union (mIoU) of 85.28%. The highest detection accuracy is achieved by incorporating pseudo normalized difference vegetation index (pNDVI) with spatial data, reducing error rate by 10.61 percentage points (pp) compared to using spatial information alone. These findings highlight the potential of MS-LiDAR and DL to improve tree extraction and further tree inventories.
CVApr 11, 2025
Road Grip Uncertainty Estimation Through Surface State SegmentationJyri Maanpää, Julius Pesonen, Iaroslav Melekhov et al.
Slippery road conditions pose significant challenges for autonomous driving. Beyond predicting road grip, it is crucial to estimate its uncertainty reliably to ensure safe vehicle control. In this work, we benchmark several uncertainty prediction methods to assess their effectiveness for grip uncertainty estimation. Additionally, we propose a novel approach that leverages road surface state segmentation to predict grip uncertainty. Our method estimates a pixel-wise grip probability distribution based on inferred road surface conditions. Experimental results indicate that the proposed approach enhances the robustness of grip uncertainty prediction.
CVOct 28, 2020
Multimodal End-to-End Learning for Autonomous Steering in Adverse Road and Weather ConditionsJyri Maanpää, Josef Taher, Petri Manninen et al.
Autonomous driving is challenging in adverse road and weather conditions in which there might not be lane lines, the road might be covered in snow and the visibility might be poor. We extend the previous work on end-to-end learning for autonomous steering to operate in these adverse real-life conditions with multimodal data. We collected 28 hours of driving data in several road and weather conditions and trained convolutional neural networks to predict the car steering wheel angle from front-facing color camera images and lidar range and reflectance data. We compared the CNN model performances based on the different modalities and our results show that the lidar modality improves the performances of different multimodal sensor-fusion models. We also performed on-road tests with different models and they support this observation.
CVJan 6, 2020
CAE-LO: LiDAR Odometry Leveraging Fully Unsupervised Convolutional Auto-Encoder for Interest Point Detection and Feature DescriptionDeyu Yin, Qian Zhang, Jingbin Liu et al.
As an important technology in 3D mapping, autonomous driving, and robot navigation, LiDAR odometry is still a challenging task. Appropriate data structure and unsupervised deep learning are the keys to achieve an easy adjusted LiDAR odometry solution with high performance. Utilizing compact 2D structured spherical ring projection model and voxel model which preserves the original shape of input data, we propose a fully unsupervised Convolutional Auto-Encoder based LiDAR Odometry (CAE-LO) that detects interest points from spherical ring data using 2D CAE and extracts features from multi-resolution voxel model using 3D CAE. We make several key contributions: 1) experiments based on KITTI dataset show that our interest points can capture more local details to improve the matching success rate on unstructured scenarios and our features outperform state-of-the-art by more than 50% in matching inlier ratio; 2) besides, we also propose a keyframe selection method based on matching pairs transferring, an odometry refinement method for keyframes based on extended interest points from spherical rings, and a backward pose update method. The odometry refinement experiments verify the proposed ideas' feasibility and effectiveness.