Optimal radial basis for density-based atomic representations
This work addresses the challenge of designing efficient atomic representations for materials science and chemistry, offering an incremental improvement over heuristic basis optimization methods.
The authors tackled the problem of optimizing the basis set for atomic-scale machine learning representations by proposing an unsupervised method to determine the most compact basis relevant to a dataset, which improves accuracy and computational efficiency, especially for high-body order correlations, as demonstrated on molecular and condensed-phase models.
The input of almost every machine learning algorithm targeting the properties of matter at the atomic scale involves a transformation of the list of Cartesian atomic coordinates into a more symmetric representation. Many of the most popular representations can be seen as an expansion of the symmetrized correlations of the atom density, and differ mainly by the choice of basis. Considerable effort has been dedicated to the optimization of the basis set, typically driven by heuristic considerations on the behavior of the regression target. Here we take a different, unsupervised viewpoint, aiming to determine the basis that encodes in the most compact way possible the structural information that is relevant for the dataset at hand. For each training dataset and number of basis functions, one can determine a unique basis that is optimal in this sense, and can be computed at no additional cost with respect to the primitive basis by approximating it with splines. We demonstrate that this construction yields representations that are accurate and computationally efficient, particularly when constructing representations that correspond to high-body order correlations. We present examples that involve both molecular and condensed-phase machine-learning models.