Recursive Tree Grammar Autoencoders
This work addresses the problem of generating trees as output for applications such as drug discovery and intelligent tutoring systems, offering an incremental improvement over existing methods.
This paper introduces Recursive Tree Grammar Autoencoders (RTG-AEs) for encoding and decoding trees, which are useful for applications like molecule optimization and hint generation. The model combines variational autoencoders, grammatical knowledge, and recursive processing to achieve improved autoencoding error, training time, and optimization scores compared to four baselines on synthetic and real datasets.
Machine learning on trees has been mostly focused on trees as input to algorithms. Much less research has investigated trees as output, which has many applications, such as molecule optimization for drug discovery, or hint generation for intelligent tutoring systems. In this work, we propose a novel autoencoder approach, called recursive tree grammar autoencoder (RTG-AE), which encodes trees via a bottom-up parser and decodes trees via a tree grammar, both learned via recursive neural networks that minimize the variational autoencoder loss. The resulting encoder and decoder can then be utilized in subsequent tasks, such as optimization and time series prediction. RTG-AEs are the first model to combine variational autoencoders, grammatical knowledge, and recursive processing. Our key message is that this unique combination of all three elements outperforms models which combine any two of the three. In particular, we perform an ablation study to show that our proposed method improves the autoencoding error, training time, and optimization score on synthetic as well as real datasets compared to four baselines. We further prove that RTG-AEs parse and generate trees in linear time and are expressive enough to handle all regular tree grammars.