CLApr 5, 2024

A Dataset for Physical and Abstract Plausibility and Sources of Human Disagreement

arXiv:2404.04035v1225 citationsh-index: 5Law
Originality Synthesis-oriented
AI Analysis

This dataset addresses the need for balanced plausibility assessment in natural language processing, particularly for events with varying abstractness, though it is incremental as it builds on existing plausibility research.

The authors introduced a dataset for evaluating physical and abstract plausibility of events in English, derived from Wikipedia sentences with automated perturbations, and found that annotators prefer plausible events and disagree more on implausible ones, with concrete event participants increasing perceived implausibility.

We present a novel dataset for physical and abstract plausibility of events in English. Based on naturally occurring sentences extracted from Wikipedia, we infiltrate degrees of abstractness, and automatically generate perturbed pseudo-implausible events. We annotate a filtered and balanced subset for plausibility using crowd-sourcing, and perform extensive cleansing to ensure annotation quality. In-depth quantitative analyses indicate that annotators favor plausibility over implausibility and disagree more on implausible events. Furthermore, our plausibility dataset is the first to capture abstractness in events to the same extent as concreteness, and we find that event abstractness has an impact on plausibility ratings: more concrete event participants trigger a perception of implausibility.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes