LGJan 22, 2021

A Study of Continuous Vector Representationsfor Theorem Proving

Stanisław Purgał, Julian Parsert, Cezary Kaliszyk

arXiv:2101.09142v16.57 citations

Originality Incremental advance

AI Analysis

This work addresses the challenge of applying AI to theorem proving by providing a novel encoding method, though it appears incremental as it builds on existing representation techniques.

The paper tackled the problem of representing mathematical formulas for machine learning by developing a reversible dense vector encoding that preserves logical properties, and demonstrated its viability on syntactic/semantic datasets and premise selection in the Mizar corpus.

Applying machine learning to mathematical terms and formulas requires a suitable representation of formulas that is adequate for AI methods. In this paper, we develop an encoding that allows for logical properties to be preserved and is additionally reversible. This means that the tree shape of a formula including all symbols can be reconstructed from the dense vector representation. We do that by training two decoders: one that extracts the top symbol of the tree and one that extracts embedding vectors of subtrees. The syntactic and semantic logical properties that we aim to reserve include both structural formula properties, applicability of natural deduction steps, and even more complex operations like unifiability. We propose datasets that can be used to train these syntactic and semantic properties. We evaluate the viability of the developed encoding across the proposed datasets as well as for the practical theorem proving problem of premise selection in the Mizar corpus.

View on arXiv PDF

Similar