AIJan 9, 2019

High-Fidelity Vector Space Models of Structured Data

arXiv:1901.02565v2
AI Analysis

This addresses the challenge of faithfully representing structured data for machine learning systems, which is incremental as it builds on existing vector space models.

The paper tackles the problem of representing structured data as fixed-size real-valued vectors for machine learning by introducing a novel approach that compiles data into a satisfiability problem, enabling precise vector representations. It demonstrates the method in automated reasoning and natural language processing, showing that vectors can be translated back to original structured forms.

Machine learning systems regularly deal with structured data in real-world applications. Unfortunately, such data has been difficult to faithfully represent in a way that most machine learning techniques would expect, i.e. as a real-valued vector of a fixed, pre-specified size. In this work, we introduce a novel approach that compiles structured data into a satisfiability problem which has in its set of solutions at least (and often only) the input data. The satisfiability problem is constructed from constraints which are generated automatically a priori from a given signature, thus trivially allowing for a bag-of-words-esque vector representation of the input to be constructed. The method is demonstrated in two areas, automated reasoning and natural language processing, where it is shown to produce vector representations of natural-language sentences and first-order logic clauses that can be precisely translated back to their original, structured input forms.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes