Implicit Geometry and Interaction Embeddings Improve Few-Shot Molecular Property Prediction
This work addresses the challenge of predicting molecular properties with limited data, which is crucial for drug discovery and materials science, though it is incremental as it builds on existing few-shot learning algorithms.
The paper tackled the problem of few-shot molecular property prediction by developing embeddings that encode complex molecular characteristics like 3D geometries and chemical interactions, resulting in improved performance for Multi-Task, MAML, and Prototypical Network methods on multiple benchmarks.
Few-shot learning is a promising approach to molecular property prediction as supervised data is often very limited. However, many important molecular properties depend on complex molecular characteristics -- such as the various 3D geometries a molecule may adopt or the types of chemical interactions it can form -- that are not explicitly encoded in the feature space and must be approximated from low amounts of data. Learning these characteristics can be difficult, especially for few-shot learning algorithms that are designed for fast adaptation to new tasks. In this work, we develop molecular embeddings that encode complex molecular characteristics to improve the performance of few-shot molecular property prediction. Our approach leverages large amounts of synthetic data, namely the results of molecular docking calculations, and a multi-task learning paradigm to structure the embedding space. On multiple molecular property prediction benchmarks, training from the embedding space substantially improves Multi-Task, MAML, and Prototypical Network few-shot learning performance. Our code is available at https://github.com/cfifty/IGNITE.