SP-10K: A Large-scale Evaluation Set for Selectional Preference Acquisition
This provides a standardized benchmark for researchers in natural language processing to assess selectional preference acquisition, though it is incremental as it builds on existing evaluation methods.
The authors tackled the need for better evaluation of selectional preference models by introducing SP-10K, a large-scale dataset with human ratings for 10,000 pairs across five relations, and demonstrated its utility by evaluating three methods and linking it to commonsense knowledge and pronoun resolution tasks.
Selectional Preference (SP) is a commonly observed language phenomenon and proved to be useful in many natural language processing tasks. To provide a better evaluation method for SP models, we introduce SP-10K, a large-scale evaluation set that provides human ratings for the plausibility of 10,000 SP pairs over five SP relations, covering 2,500 most frequent verbs, nouns, and adjectives in American English. Three representative SP acquisition methods based on pseudo-disambiguation are evaluated with SP-10K. To demonstrate the importance of our dataset, we investigate the relationship between SP-10K and the commonsense knowledge in ConceptNet5 and show the potential of using SP to represent the commonsense knowledge. We also use the Winograd Schema Challenge to prove that the proposed new SP relations are essential for the hard pronoun coreference resolution problem.