Novel Aficionados and Doppelgängers: a referential task for semantic representations of individual entities
This addresses a semantic cognition problem for researchers in linguistics and machine learning, but it is incremental as it builds on existing distributional semantics with a new task and dataset.
The paper tackled the problem of why proper names are harder to learn and retrieve than common nouns in both human and machine learning by analyzing their linguistic distributions. The results showed that distributional representations of individual entities are less distinguishable than those of common nouns, mirroring human cognition.
In human semantic cognition, proper names (names which refer to individual entities) are harder to learn and retrieve than common nouns. This seems to be the case for machine learning algorithms too, but the linguistic and distributional reasons for this behaviour have not been investigated in depth so far. To tackle this issue, we show that the semantic distinction between proper names and common nouns is reflected in their linguistic distributions by employing an original task for distributional semantics, the Doppelgänger test, an extensive set of models, and a new dataset, the Novel Aficionados dataset. The results indicate that the distributional representations of different individual entities are less clearly distinguishable from each other than those of common nouns, an outcome which intriguingly mirrors human cognition.