LGMLJan 27

RFX-Fuse: Breiman and Cutler's Unified ML Engine + Native Explainable Similarity

arXiv:2603.13234
Originality Incremental advance
AI Analysis

This addresses the problem of fragmented ML workflows for practitioners by providing a more integrated and efficient solution, though it is incremental as it revives and extends an existing vision.

The paper tackles the fragmentation of modern machine learning pipelines by introducing RFX-Fuse, a unified engine that implements Breiman and Cutler's original Random Forest vision, including classification, regression, unsupervised learning, similarity, outlier detection, imputation, and visualization in a single model object, achieving a 1 to 2 model alternative compared to 5+ separate tools.

Breiman and Cutler's original Random Forest was designed as a unified ML engine -- not merely an ensemble predictor. Their implementation included classification, regression, unsupervised learning, proximity-based similarity, outlier detection, missing value imputation, and visualization -- capabilities that modern libraries like scikit-learn never implemented. RFX-Fuse (Random Forests X [X=compression] -- Forest Unified Learning and Similarity Engine) delivers Breiman and Cutler's complete vision with native GPU/CPU support. Modern ML pipelines require 5+ separate tools -- XGBoost for prediction, FAISS for similarity, SHAP for explanations, Isolation Forest for outliers, custom code for importance. RFX-Fuse provides a 1 to 2 model object alternative -- a single set of trees grown once. Novel Contributions: (1) Proximity Importance -- native explainable similarity: proximity measures that samples are similar; proximity importance explains why. (2) Dataset-specific imputation validation for general tabular data -- ranking imputation methods by how real the imputed data looks, without ground truth labels.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes