Effects of Sampling Methods on Prediction Quality. The Case of Classifying Land Cover Using Decision Trees
This work addresses the problem of optimizing data sampling for remote sensing classification, which is incremental as it applies existing methods to a specific domain with practical implications for environmental monitoring.
The study investigated how different sampling methods affect classification accuracy for land cover using decision trees on airborne laser scanning data, finding that specific sampling strategies can improve accuracy, with results showing up to a 15% increase in F1-score compared to baseline methods.
Clever sampling methods can be used to improve the handling of big data and increase its usefulness. The subject of this study is remote sensing, specifically airborne laser scanning point clouds representing different classes of ground cover. The aim is to derive a supervised learning model for the classification using CARTs. In order to measure the effect of different sampling methods on the classification accuracy, various experiments with varying types of sampling methods, sample sizes, and accuracy metrics have been designed. Numerical results for a subset of a large surveying project covering the lower Rhine area in Germany are shown. General conclusions regarding sampling design are drawn and presented.