Learning Structure-Aware Representations of Dependent Types
This work addresses the challenge of applying machine learning to dependently-typed programming languages like Agda, which is incremental in extending the Agda ecosystem into ML.
The authors tackled the problem of representing dependently-typed programs (specifically Agda code) for machine learning by creating the first extensive dataset of Agda program-proofs and proposing a novel neural architecture based on structural principles. In a premise selection evaluation, their method achieved promising results that surpassed strong baselines.
Agda is a dependently-typed programming language and a proof assistant, pivotal in proof formalization and programming language theory. This paper extends the Agda ecosystem into machine learning territory, and, vice versa, makes Agda-related resources available to machine learning practitioners. We introduce and release a novel dataset of Agda program-proofs that is elaborate and extensive enough to support various machine learning applications -- the first of its kind. Leveraging the dataset's ultra-high resolution, which details proof states at the sub-type level, we propose a novel neural architecture targeted at faithfully representing dependently-typed programs on the basis of structural rather than nominal principles. We instantiate and evaluate our architecture in a premise selection setup, where it achieves promising initial results, surpassing strong baselines.