CL AINov 6, 2024

What Really is Commonsense Knowledge?

Quyet V. Do, Junze Li, Tung-Duong Vuong, Zhaowei Wang, Yangqiu Song, Xiaojuan Ma

arXiv:2411.03964v12.72 citationsh-index: 10

Originality Synthesis-oriented

AI Analysis

This addresses a critical issue for NLP researchers by clarifying benchmark validity, though it is incremental as it builds on existing debates and datasets.

The study tackled the problem of ambiguous definitions in commonsense knowledge benchmarks by proposing a unified definition and applying it to CommonsenseQA datasets, revealing that a large portion of instances are not commonsense-related and that LLMs perform worse on true commonsense instances.

Commonsense datasets have been well developed in Natural Language Processing, mainly through crowdsource human annotation. However, there are debates on the genuineness of commonsense reasoning benchmarks. In specific, a significant portion of instances in some commonsense benchmarks do not concern commonsense knowledge. That problem would undermine the measurement of the true commonsense reasoning ability of evaluated models. It is also suggested that the problem originated from a blurry concept of commonsense knowledge, as distinguished from other types of knowledge. To demystify all of the above claims, in this study, we survey existing definitions of commonsense knowledge, ground into the three frameworks for defining concepts, and consolidate them into a multi-framework unified definition of commonsense knowledge (so-called consolidated definition). We then use the consolidated definition for annotations and experiments on the CommonsenseQA and CommonsenseQA 2.0 datasets to examine the above claims. Our study shows that there exists a large portion of non-commonsense-knowledge instances in the two datasets, and a large performance gap on these two subsets where Large Language Models (LLMs) perform worse on commonsense-knowledge instances.

View on arXiv PDF

Similar