LG MLNov 23, 2025

The Generalized Proximity Forest

Ben Shaw, Adam Rustad, Sofia Pelagalli Maia, Jake S. Rhodes, Kevin R. Moon

arXiv:2511.19487v1

Originality Incremental advance

AI Analysis

This work provides a more flexible framework for using proximities in machine learning, but it is incremental as it builds on existing Proximity Forest concepts.

The paper tackles the limitation of Random Forest proximities by introducing the generalized Proximity Forest model, which extends these proximities to all supervised distance-based machine learning contexts, including regression and meta-learning, and experimentally shows advantages over RF and k-nearest neighbors models.

Recent work has demonstrated the utility of Random Forest (RF) proximities for various supervised machine learning tasks, including outlier detection, missing data imputation, and visualization. However, the utility of the RF proximities depends upon the success of the RF model, which itself is not the ideal model in all contexts. RF proximities have recently been extended to time series by means of the distance-based Proximity Forest (PF) model, among others, affording time series analysis with the benefits of RF proximities. In this work, we introduce the generalized PF model, thereby extending RF proximities to all contexts in which supervised distance-based machine learning can occur. Additionally, we introduce a variant of the PF model for regression tasks. We also introduce the notion of using the generalized PF model as a meta-learning framework, extending supervised imputation capability to any pre-trained classifier. We experimentally demonstrate the unique advantages of the generalized PF model compared with both the RF model and the $k$-nearest neighbors model.

View on arXiv PDF

Similar