LGMay 13

Force-Aware Neural Tangent Kernels for Scalable and Robust Active Learning of MLIPs

Eszter Varga-Umbrich, Zachary Weller-Davies, Paul Duckworth, Jules Tilly, Olivier Peltre, Shikha Surana

arXiv:2605.137888.5

Predicted impact top 62% in LG · last 90 daysOriginality Incremental advance

AI Analysis

This work provides a scalable and robust active learning method for fine-tuning foundation MLIPs, addressing practical challenges of large candidate pools, force supervision, and distribution shift.

The authors introduce a linearly scaling active learning framework for MLIPs that uses chunked feature-space posterior-variance shortlisting and a force-aware Neural Tangent Kernel, achieving the lowest energy and force MAE/RMSE on OC20 and remaining competitive on other benchmarks while being more efficient than committee-based methods.

Active learning for machine-learning interatomic potentials (MLIPs) must address several challenges to be practical: scaling to large candidate pools, leveraging energy-force supervision, and maintaining robustness when candidate pools are biased relative to the target distribution. In this work, we jointly address these challenges. We first introduce a linearly scaling acquisition framework based on chunked feature-space posterior-variance shortlisting. By avoiding materialisation of the candidate and train set kernels, this approach enables screening of ~200k structures within hours and applies broadly to acquisition strategies that score candidates based on molecular similarity metrics. We then extend the Neural Tangent Kernel (NTK) to a force-aware setting via mixed parameter-coordinate derivatives, yielding a force NTK and a joint energy-force NTK that provide natural similarity metrics for vector-field prediction. We demonstrate the effectiveness of the joint energy-force NTK on the OC20 dataset, where force-aware acquisition is crucial: it achieves the lowest energy and force MAE and RMSE across all metrics and distribution splits. Across T1x, PMechDB, and RGD benchmarks, our force NTK methods remain competitive with established baselines while being significantly more efficient than committee-based approaches. Under a controlled candidate-pool shift case study on T1x, acquisition based on pretrained MLIP embeddings and NTKs remains robust, whereas committee-based methods exhibit higher variance. Overall, these results show that a single pretrained MLIP can enable scalable, force-aware, and distribution-robust active learning for foundation-model fine-tuning.

View on arXiv PDF

Similar