CLJun 9, 2019

Learning to Predict Novel Noun-Noun Compounds

arXiv:1906.03634v21090 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of generating novel concepts in natural language processing, though it is incremental as it builds on existing compositional models.

The paper tackles the problem of predicting plausible but unseen noun-noun compounds using temporally and contextually-aware models trained on observed compounds with negative evidence, achieving around 85% accuracy in generating novel compounds attested in unseen data and an additional estimated 5% plausibility based on human judgments.

We introduce temporally and contextually-aware models for the novel task of predicting unseen but plausible concepts, as conveyed by noun-noun compounds in a time-stamped corpus. We train compositional models on observed compounds, more specifically the composed distributed representations of their constituents across a time-stamped corpus, while giving it corrupted instances (where head or modifier are replaced by a random constituent) as negative evidence. The model captures generalisations over this data and learns what combinations give rise to plausible compounds and which ones do not. After training, we query the model for the plausibility of automatically generated novel combinations and verify whether the classifications are accurate. For our best model, we find that in around 85% of the cases, the novel compounds generated are attested in previously unseen data. An additional estimated 5% are plausible despite not being attested in the recent corpus, based on judgments from independent human raters.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes