Effective Utilisation of Multiple Open-Source Datasets to Improve Generalisation Performance of Point Cloud Segmentation Models
This work addresses generalization issues in aerial point cloud segmentation for applications like drone or plane-based sensing, but it is incremental as it builds on existing methods with dataset combinations and sampling strategies.
The paper tackles the problem of poor generalization in point cloud segmentation models by training on multiple open-source datasets, showing that a naive combination improves generalization and an improved sampling strategy further boosts performance substantially, with consistent densities identified as the most important factor.
Semantic segmentation of aerial point cloud data can be utilised to differentiate which points belong to classes such as ground, buildings, or vegetation. Point clouds generated from aerial sensors mounted to drones or planes can utilise LIDAR sensors or cameras along with photogrammetry. Each method of data collection contains unique characteristics which can be learnt independently with state-of-the-art point cloud segmentation models. Utilising a single point cloud segmentation model can be desirable in situations where point cloud sensors, quality, and structures can change. In these situations it is desirable that the segmentation model can handle these variations with predictable and consistent results. Although deep learning can segment point clouds accurately it often suffers in generalisation, adapting poorly to data which is different than the training data. To address this issue, we propose to utilise multiple available open source fully annotated datasets to train and test models that are better able to generalise. In this paper we discuss the combination of these datasets into a simple training set and challenging test set. Combining datasets allows us to evaluate generalisation performance on known variations in the point cloud data. We show that a naive combination of datasets produces a model with improved generalisation performance as expected. We go on to show that an improved sampling strategy which decreases sampling variations increases the generalisation performance substantially on top of this. Experiments to find which sample variations give this performance boost found that consistent densities are the most important.