Probing Taxonomic and Thematic Embeddings for Taxonomic Information
This work addresses the problem of understanding taxonomic encoding in embeddings for AI natural language understanding, but it is incremental as it builds on existing probing methods and embeddings.
The paper investigated how taxonomic information is structurally encoded in word embeddings by designing a hypernym-hyponym probing task and comparing taxonomic and thematic SGNS and GloVe embeddings. It found that both types encode some taxonomic information, with the amount and geometric properties influenced by encoder architecture and training data, and only taxonomic embeddings carry this information in their norm.
Modelling taxonomic and thematic relatedness is important for building AI with comprehensive natural language understanding. The goal of this paper is to learn more about how taxonomic information is structurally encoded in embeddings. To do this, we design a new hypernym-hyponym probing task and perform a comparative probing study of taxonomic and thematic SGNS and GloVe embeddings. Our experiments indicate that both types of embeddings encode some taxonomic information, but the amount, as well as the geometric properties of the encodings, are independently related to both the encoder architecture, as well as the embedding training data. Specifically, we find that only taxonomic embeddings carry taxonomic information in their norm, which is determined by the underlying distribution in the data.