CVAILGMar 14, 2025

Mitigating Bad Ground Truth in Supervised Machine Learning based Crop Classification: A Multi-Level Framework with Sentinel-2 Images

arXiv:2503.11807v1h-index: 62024 IEEE India Geoscience and Remote Sensing Symposium (InGARSS)
Originality Incremental advance
AI Analysis

This work addresses crop classification accuracy for agricultural management, though it is incremental as it focuses on data cleaning rather than novel model development.

The paper tackled the problem of inaccurate ground truth data in supervised machine learning for crop classification by proposing a multi-level cleaning framework using Sentinel-2 images, resulting in up to 70% absolute percentage points higher F1 scores when training a Random Forest model with cleaned data.

In agricultural management, precise Ground Truth (GT) data is crucial for accurate Machine Learning (ML) based crop classification. Yet, issues like crop mislabeling and incorrect land identification are common. We propose a multi-level GT cleaning framework while utilizing multi-temporal Sentinel-2 data to address these issues. Specifically, this framework utilizes generating embeddings for farmland, clustering similar crop profiles, and identification of outliers indicating GT errors. We validated clusters with False Colour Composite (FCC) checks and used distance-based metrics to scale and automate this verification process. The importance of cleaning the GT data became apparent when the models were trained on the clean and unclean data. For instance, when we trained a Random Forest model with the clean GT data, we achieved upto 70\% absolute percentage points higher for the F1 score metric. This approach advances crop classification methodologies, with potential for applications towards improving loan underwriting and agricultural decision-making.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes