CLJun 11, 2019

A Systematic Comparison of English Noun Compound Representations

arXiv:1906.04772v131.01090 citationsHas Code

Originality Synthesis-oriented

AI Analysis

This work addresses the challenge of building meaningful representations for noun compounds in natural language processing, which is important for tasks involving rare or unseen phrases, but it is incremental as it compares existing methods without introducing a new paradigm.

The paper systematically compared different methods for representing English noun compounds, including distributional, compositional, and paraphrase-based approaches, and found that composition functions generally produce higher-quality representations than distributional ones, with performance improving with computational power.

Building meaningful representations of noun compounds is not trivial since many of them scarcely appear in the corpus. To that end, composition functions approximate the distributional representation of a noun compound by combining its constituent distributional vectors. In the more general case, phrase embeddings have been trained by minimizing the distance between the vectors representing paraphrases. We compare various types of noun compound representations, including distributional, compositional, and paraphrase-based representations, through a series of tasks and analyses, and with an extensive number of underlying word embeddings. We find that indeed, in most cases, composition functions produce higher quality representations than distributional ones, and they improve with computational power. No single function performs best in all scenarios, suggesting that a joint training objective may produce improved representations.

View on arXiv PDF Code

Similar