word2vec, node2vec, graph2vec, X2vec: Towards a Theory of Vector Embeddings of Structured Data
This foundational work addresses a gap in machine learning theory, potentially benefiting researchers and practitioners in AI and data science by providing a framework for analyzing and improving embedding techniques.
The paper tackles the lack of theoretical understanding of vector embeddings for structured data, such as graphs, by surveying existing methods and proposing two theoretical approaches to establish foundational principles.
Vector representations of graphs and relational structures, whether hand-crafted feature vectors or learned representations, enable us to apply standard data analysis and machine learning techniques to the structures. A wide range of methods for generating such embeddings have been studied in the machine learning and knowledge representation literature. However, vector embeddings have received relatively little attention from a theoretical point of view. Starting with a survey of embedding techniques that have been used in practice, in this paper we propose two theoretical approaches that we see as central for understanding the foundations of vector embeddings. We draw connections between the various approaches and suggest directions for future research.