Icospherical Chemical Objects (ICOs) allow for chemical data augmentation and maintain rotational, translation and permutation invariance
This work addresses the need for rotationally invariant 3-D encodings in chemistry machine learning, offering a domain-specific solution for augmenting small datasets.
The authors tackled the problem of small chemistry datasets by introducing Icospherical Chemical Objects (ICOs), a 3-D encoding method that maintains rotational invariance and enables data augmentation, demonstrating its effectiveness in predicting molecular properties, solubility, and protein binding with good performance.
Dataset augmentation is a common way to deal with small datasets; Chemistry datasets are often small. Spherical convolutional neural networks (SphNNs) and Icosahedral neural networks (IcoNNs) are a type of geometric machine learning algorithm that maintains rotational symmetry. Molecular structure has rotational invariance and is inherently 3-D, and thus we need 3-D encoding methods to input molecular structure into machine learning. In this paper I present Icospherical Chemical Objects (ICOs) that enable the encoding of 3-D data in a rotationally invariant way which works with spherical or icosahedral neural networks and allows for dataset augmentation. I demonstrate the ICO featurisation method on the following tasks: predicting general molecular properties, predicting solubility of drug like molecules and the protein binding problem and find that ICO and SphNNs perform well on all problems.