LG QMMay 4, 2023

Are VAEs Bad at Reconstructing Molecular Graphs?

Hagen Muenkler, Hubert Misztela, Michal Pikusa, Marwin Segler, Nadine Schneider, Krzysztof Maziarz

arXiv:2305.03041v16.63 citationsh-index: 26

Originality Synthesis-oriented

AI Analysis

This work addresses the performance gap in molecular graph reconstruction for researchers in computational chemistry and machine learning, though it is incremental as it clarifies existing limitations without proposing a new solution.

The study evaluated the reconstruction accuracy of state-of-the-art variational auto-encoders on molecular graphs, finding it surprisingly low on a large, diverse dataset, but showed that improving reconstruction does not enhance sampling or optimization performance.

Many contemporary generative models of molecules are variational auto-encoders of molecular graphs. One term in their training loss pertains to reconstructing the input, yet reconstruction capabilities of state-of-the-art models have not yet been thoroughly compared on a large and chemically diverse dataset. In this work, we show that when several state-of-the-art generative models are evaluated under the same conditions, their reconstruction accuracy is surprisingly low, worse than what was previously reported on seemingly harder datasets. However, we show that improving reconstruction does not directly lead to better sampling or optimization performance. Failed reconstructions from the MoLeR model are usually similar to the inputs, assembling the same motifs in a different way, and possess similar chemical properties such as solubility. Finally, we show that the input molecule and its failed reconstruction are usually mapped by the different encoders to statistically distinguishable posterior distributions, hinting that posterior collapse may not fully explain why VAEs are bad at reconstructing molecular graphs.

View on arXiv PDF

Similar