PROPRES: Investigating the Projectivity of Presupposition with Various Triggers and Environments
This addresses a gap in natural language understanding for researchers by highlighting human judgment variability and model limitations in pragmatic inference tasks, though it is incremental as it builds on prior work with a new dataset.
The study tackled the problem of variable projectivity in presuppositions by creating the PROPRES dataset with 12k premise-hypothesis pairs across six triggers and five environments, revealing that humans show variability while the best model, DeBERTa, fails to fully capture it.
What makes a presupposition of an utterance -- information taken for granted by its speaker -- different from other pragmatic inferences such as an entailment is projectivity (e.g., the negative sentence the boy did not stop shedding tears presupposes the boy had shed tears before). The projectivity may vary depending on the combination of presupposition triggers and environments. However, prior natural language understanding studies fail to take it into account as they either use no human baseline or include only negation as an entailment-canceling environment to evaluate models' performance. The current study attempts to reconcile these issues. We introduce a new dataset, projectivity of presupposition (PROPRES, which includes 12k premise-hypothesis pairs crossing six triggers involving some lexical variety with five environments. Our human evaluation reveals that humans exhibit variable projectivity in some cases. However, the model evaluation shows that the best-performed model, DeBERTa, does not fully capture it. Our findings suggest that probing studies on pragmatic inferences should take extra care of the human judgment variability and the combination of linguistic items.