Modeling Event Plausibility with Consistent Conceptual Abstraction
This addresses a common-sense reasoning bottleneck in NLP, but is incremental as it builds on existing models.
The paper tackled the problem of Transformer-based language models being inconsistent in event plausibility judgments across conceptual classes, and presented a post-hoc consistency method that improved correlation with human judgments.
Understanding natural language requires common sense, one aspect of which is the ability to discern the plausibility of events. While distributional models -- most recently pre-trained, Transformer language models -- have demonstrated improvements in modeling event plausibility, their performance still falls short of humans'. In this work, we show that Transformer-based plausibility models are markedly inconsistent across the conceptual classes of a lexical hierarchy, inferring that "a person breathing" is plausible while "a dentist breathing" is not, for example. We find this inconsistency persists even when models are softly injected with lexical knowledge, and we present a simple post-hoc method of forcing model consistency that improves correlation with human plausibility judgements.