LG ATOM-PH MLMar 30, 2020

From Patterson Maps to Atomic Coordinates: Training a Deep Neural Network to Solve the Phase Problem for a Simplified Case

arXiv:2003.13767v15 citations

Originality Incremental advance

AI Analysis

This work addresses the phase problem for crystallographers, but it is incremental as it focuses on a simplified synthetic scenario.

The paper tackled the phase problem in crystallography by training a deep neural network to infer atomic coordinates from Patterson maps for a simplified case of 10 randomly positioned atoms, achieving generalization to unseen cases with synthetic data. It found that the network required uniquely described Patterson maps to train effectively, addressing conflicts through centering, centrosymmetric inversion handling, and spatial constraints.

This work demonstrates that, for a simple case of 10 randomly positioned atoms, a neural network can be trained to infer atomic coordinates from Patterson maps. The network was trained entirely on synthetic data. For the training set, the network outputs were 3D maps of randomly positioned atoms. From each output map, a Patterson map was generated and used as input to the network. The network generalized to cases not in the test set, inferring atom positions from Patterson maps. A key finding in this work is that the Patterson maps presented to the network input during training must uniquely describe the atomic coordinates they are paired with on the network output or the network will not train and it will not generalize. The network cannot train on conflicting data. Avoiding conflicts is handled in 3 ways: 1. Patterson maps are invariant to translation. To remove this degree of freedom, output maps are centered on the average of their atom positions. 2. Patterson maps are invariant to centrosymmetric inversion. This conflict is removed by presenting the network output with both the atoms used to make the Patterson Map and their centrosymmetry-related counterparts simultaneously. 3. The Patterson map does not uniquely describe a set of coordinates because the origin for each vector in the Patterson map is ambiguous. By adding empty space around the atoms in the output map, this ambiguity is removed. Forcing output atoms to be closer than half the output box edge dimension means the origin of each peak in the Patterson map must be the origin to which it is closest.

View on arXiv PDF

Similar