Learning-Order Autoregressive Models with Application to Molecular Graph Generation
This addresses the challenge of generating structured data like molecules for drug discovery, representing an incremental advance by adapting autoregressive models to learn dynamic orderings.
The paper tackles the problem of generating high-dimensional data like graphs without a natural ordering by introducing an autoregressive model with a trainable probabilistic ordering policy. It achieves state-of-the-art results on molecular graph generation benchmarks, such as QM9 and ZINC250k, with improved distribution similarity and drug-likeness metrics.
Autoregressive models (ARMs) have become the workhorse for sequence generation tasks, since many problems can be modeled as next-token prediction. While there appears to be a natural ordering for text (i.e., left-to-right), for many data types, such as graphs, the canonical ordering is less obvious. To address this problem, we introduce a variant of ARM that generates high-dimensional data using a probabilistic ordering that is sequentially inferred from data. This model incorporates a trainable probability distribution, referred to as an order-policy, that dynamically decides the autoregressive order in a state-dependent manner. To train the model, we introduce a variational lower bound on the log-likelihood, which we optimize with stochastic gradient estimation. We demonstrate experimentally that our method can learn meaningful autoregressive orderings in image and graph generation. On the challenging domain of molecular graph generation, we achieve state-of-the-art results on the QM9 and ZINC250k benchmarks, evaluated across key metrics for distribution similarity and drug-likeless.