CLAINov 6, 2024

What Really is Commonsense Knowledge?

arXiv:2411.03964v12 citationsh-index: 10
Originality Synthesis-oriented
AI Analysis

This addresses a critical issue for NLP researchers by clarifying benchmark validity, though it is incremental as it builds on existing debates and datasets.

The study tackled the problem of ambiguous definitions in commonsense knowledge benchmarks by proposing a unified definition and applying it to CommonsenseQA datasets, revealing that a large portion of instances are not commonsense-related and that LLMs perform worse on true commonsense instances.

Commonsense datasets have been well developed in Natural Language Processing, mainly through crowdsource human annotation. However, there are debates on the genuineness of commonsense reasoning benchmarks. In specific, a significant portion of instances in some commonsense benchmarks do not concern commonsense knowledge. That problem would undermine the measurement of the true commonsense reasoning ability of evaluated models. It is also suggested that the problem originated from a blurry concept of commonsense knowledge, as distinguished from other types of knowledge. To demystify all of the above claims, in this study, we survey existing definitions of commonsense knowledge, ground into the three frameworks for defining concepts, and consolidate them into a multi-framework unified definition of commonsense knowledge (so-called consolidated definition). We then use the consolidated definition for annotations and experiments on the CommonsenseQA and CommonsenseQA 2.0 datasets to examine the above claims. Our study shows that there exists a large portion of non-commonsense-knowledge instances in the two datasets, and a large performance gap on these two subsets where Large Language Models (LLMs) perform worse on commonsense-knowledge instances.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes