CLApr 20, 2023

CKBP v2: Better Annotation and Reasoning for Commonsense Knowledge Base Population

Tianqing Fang, Quyet V. Do, Zihao Zheng, Weiqi Wang, Sehyun Choi, Zhaowei Wang, Yangqiu Song

Tencent

arXiv:2304.10392v23.611 citationsh-index: 52Has Code

Originality Incremental advance

AI Analysis

This work addresses the need for more reliable evaluation in commonsense knowledge acquisition for NLP researchers, though it is incremental as it builds on prior frameworks.

The paper tackles the problem of improving the quality and representativeness of evaluation datasets for Commonsense Knowledge Base Population by introducing CKBP v2, which uses expert annotations and adversarial samples, resulting in a model that outperforms large language models like ChatGPT and GPT-3.5 in zero-shot question answering.

Commonsense Knowledge Bases (CSKB) Population, which aims at automatically expanding knowledge in CSKBs with external resources, is an important yet hard task in NLP. Fang et al. (2021a) proposed a CSKB Population (CKBP) framework with an evaluation set CKBP v1. However, CKBP v1 relies on crowdsourced annotations that suffer from a considerable number of mislabeled answers, and the evaluationset lacks alignment with the external knowledge source due to random sampling. In this paper, we introduce CKBP v2, a new high-quality CSKB Population evaluation set that addresses the two aforementioned issues by employing domain experts as annotators and incorporating diversified adversarial samples to make the evaluation data more representative. We show that CKBP v2 serves as a challenging and representative evaluation dataset for the CSKB Population task, while its development set aids in selecting a population model that leads to improved knowledge acquisition for downstream commonsense reasoning. A better population model can also help acquire more informative commonsense knowledge as additional supervision signals for both generative commonsense inference and zero-shot commonsense question answering. Specifically, the question-answering model based on DeBERTa-v3-large (He et al., 2023b) even outperforms powerful large language models in a zero-shot setting, including ChatGPT and GPT-3.5.

View on arXiv PDF Code

Similar