CL AIFeb 19, 2025

Estimating Commonsense Plausibility through Semantic Shifts

Wanqing Cui, Keping Bi, Jiafeng Guo, Xueqi Cheng

arXiv:2502.13464v12.7h-index: 50

Originality Incremental advance

AI Analysis

This addresses the challenge of evaluating language models on commonsense plausibility, which is incremental as it introduces a new discriminative method for an existing bottleneck.

The paper tackled the problem of fine-grained commonsense plausibility estimation for language models by proposing ComPaSS, a discriminative framework that measures semantic shifts with augmentations, and it consistently outperformed baselines across tasks and backbones, including showing VLMs yield superior performance on vision-grounded tasks.

Commonsense plausibility estimation is critical for evaluating language models (LMs), yet existing generative approaches--reliant on likelihoods or verbalized judgments--struggle with fine-grained discrimination. In this paper, we propose ComPaSS, a novel discriminative framework that quantifies commonsense plausibility by measuring semantic shifts when augmenting sentences with commonsense-related information. Plausible augmentations induce minimal shifts in semantics, while implausible ones result in substantial deviations. Evaluations on two types of fine-grained commonsense plausibility estimation tasks across different backbones, including LLMs and vision-language models (VLMs), show that ComPaSS consistently outperforms baselines. It demonstrates the advantage of discriminative approaches over generative methods in fine-grained commonsense plausibility evaluation. Experiments also show that (1) VLMs yield superior performance to LMs, when integrated with ComPaSS, on vision-grounded commonsense tasks. (2) contrastive pre-training sharpens backbone models' ability to capture semantic nuances, thereby further enhancing ComPaSS.

View on arXiv PDF

Similar