Spatial machine-learning model diagnostics: a model-agnostic distance-based approach
This work addresses the problem of interpreting and assessing spatial ML models for researchers and practitioners in fields like environmental science and remote sensing, offering incremental improvements to existing diagnostic methods.
The paper tackled the lack of diagnostic tools for understanding the spatial behavior of machine-learning models by proposing spatial prediction error profiles (SPEPs) and spatial variable importance profiles (SVIPs) as model-agnostic tools, demonstrating their application in environmental and remote-sensing case studies to reveal differences and similarities among various models.
While significant progress has been made towards explaining black-box machine-learning (ML) models, there is still a distinct lack of diagnostic tools that elucidate the spatial behaviour of ML models in terms of predictive skill and variable importance. This contribution proposes spatial prediction error profiles (SPEPs) and spatial variable importance profiles (SVIPs) as novel model-agnostic assessment and interpretation tools for spatial prediction models with a focus on prediction distance. Their suitability is demonstrated in two case studies representing a regionalization task in an environmental-science context, and a classification task from remotely-sensed land cover classification. In these case studies, the SPEPs and SVIPs of geostatistical methods, linear models, random forest, and hybrid algorithms show striking differences but also relevant similarities. Limitations of related cross-validation techniques are outlined, and the case is made that modelers should focus their model assessment and interpretation on the intended spatial prediction horizon. The range of autocorrelation, in contrast, is not a suitable criterion for defining spatial cross-validation test sets. The novel diagnostic tools enrich the toolkit of spatial data science, and may improve ML model interpretation, selection, and design.