Multi-RF Fusion with Multi-GNN Blending for Molecular Property Prediction
This work provides a small but measurable improvement for computational chemistry researchers, though it appears incremental as it builds on existing ensemble techniques.
The paper tackled molecular property prediction on the ogbg-molhiv dataset by developing a hybrid ensemble method combining Random Forests and Graph Neural Networks, achieving a state-of-the-art test ROC-AUC of 0.8476 with reduced variance.
Multi-RF Fusion achieves a test ROC-AUC of 0.8476 +/- 0.0002 on ogbg-molhiv (10 seeds), placing #1 on the OGB leaderboard ahead of HyperFusion (0.8475 +/- 0.0003). The core of the method is a rank-averaged ensemble of 12 Random Forest models trained on concatenated molecular fingerprints (FCFP, ECFP, MACCS, atom pairs -- 4,263 dimensions total), blended with deep-ensembled GNN predictions at 12% weight. Two findings drive the result: (1) setting max_features to 0.20 instead of the default sqrt(d) gives a +0.008 AUC gain on this scaffold split, and (2) averaging GNN predictions across 10 seeds before blending with the RF eliminates GNN seed variance entirely, dropping the final standard deviation from 0.0008 to 0.0002. No external data or pre-training is used.