An Open Benchmark Dataset for GeoAI Foundation Models for Oil Palm Mapping in Indonesia
This dataset addresses the need for reliable mapping to support sustainability efforts and regulatory frameworks for oil palm cultivation in Indonesia, though it is incremental as it primarily provides new training data rather than novel methods.
The authors tackled the problem of tracking oil palm-related deforestation in Indonesia by creating an open-access geospatial dataset of oil palm plantations and land cover types from 2020-2024 satellite imagery, which provides detailed annotations across diverse zones and includes field validation to ensure quality.
Oil palm cultivation remains one of the leading causes of deforestation in Indonesia. To better track and address this challenge, detailed and reliable mapping is needed to support sustainability efforts and emerging regulatory frameworks. We present an open-access geospatial dataset of oil palm plantations and related land cover types in Indonesia, produced through expert labeling of high-resolution satellite imagery from 2020 to 2024. The dataset provides polygon-based, wall-to-wall annotations across a range of agro-ecological zones and includes a hierarchical typology that distinguishes oil palm planting stages as well as similar perennial crops. Quality was ensured through multi-interpreter consensus and field validation. The dataset was created using wall-to-wall digitization over large grids, making it suitable for training and benchmarking both conventional convolutional neural networks and newer geospatial foundation models. Released under a CC-BY license, it fills a key gap in training data for remote sensing and aims to improve the accuracy of land cover types mapping. By supporting transparent monitoring of oil palm expansion, the resource contributes to global deforestation reduction goals and follows FAIR data principles.