CVLGJan 26

On Procrustes Contamination in Machine Learning Applications of Geometric Morphometrics

arXiv:2601.18448v1h-index: 1
Originality Incremental advance
AI Analysis

This addresses a preprocessing issue for researchers using geometric morphometrics in machine learning, providing practical guidelines, but it is incremental as it focuses on a specific methodological refinement.

The paper tackles the problem of statistical contamination in machine learning applications of geometric morphometrics caused by standard preprocessing with Generalized Procrustes Analysis, and proposes a realignment procedure that eliminates cross-sample dependency, with simulations showing robust error scaling patterns and performance degradation when ignoring landmark relationships.

Geometric morphometrics (GMM) is widely used to quantify shape variation, more recently serving as input for machine learning (ML) analyses. Standard practice aligns all specimens via Generalized Procrustes Analysis (GPA) prior to splitting data into training and test sets, potentially introducing statistical dependence and contaminating downstream predictive models. Here, the effects of GPA-induced contamination are formally characterised using controlled 2D and 3D simulations across varying sample sizes, landmark densities, and allometric patterns. A novel realignment procedure is proposed, whereby test specimens are aligned to the training set prior to model fitting, eliminating cross-sample dependency. Simulations reveal a robust "diagonal" in sample-size vs. landmark-space, reflecting the scaling of RMSE under isotropic variation, with slopes analytically derived from the degrees of freedom in Procrustes tangent space. The importance of spatial autocorrelation among landmarks is further demonstrated using linear and convolutional regression models, highlighting performance degradation when landmark relationships are ignored. This work establishes the need for careful preprocessing in ML applications of GMM, provides practical guidelines for realignment, and clarifies fundamental statistical constraints inherent to Procrustes shape space.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes