A graph-based analysis of semantic types and coercion in contextualized word embeddings
For researchers in computational semantics and lexical semantics, this provides a novel analysis tool for understanding type information in embeddings, but the results are preliminary and domain-specific.
The paper introduces a graph-based method to analyze how semantic type information is encoded in contextualized word embeddings, focusing on type coercion phenomena. Using BERT and sense-enhanced embeddings, they propose two metrics (NTP and NTE) and show that sense-enhanced embeddings better reflect semantic types, and that matching vs. mismatch sentences can be distinguished.
Semantic type mismatch between a noun and its context is central to coercion phenomena. This paper introduces a graph-based method to examine how lexical and contextual type information is reflected in word embeddings. We select nouns from ten semantic types, annotate corpus instances for type matching (matching vs. coercion vs. other mismatch vs. unrestricted), and construct graphs using BERT and sense-enhanced embeddings. Two metrics -- Neighbor Type Probability (NTP) and Neighbor Type Entropy (NTE) -- are proposed to analyze neighborhood type distributions. Results show that graphs constructed with sense-enhanced embeddings reflect semantic type information better, and matching and mismatch sentences can be distinguished through the proposed metrics.