CLAIOct 10, 2022

Do Children Texts Hold The Key To Commonsense Knowledge?

arXiv:2210.04530v1291 citationsh-index: 10
Originality Incremental advance
AI Analysis

This addresses the issue of reporting bias in AI commonsense knowledge for researchers and practitioners, offering an incremental but refreshing alternative to scaling models.

The paper tackles the problem of commonsense knowledge compilation in AI by exploring whether children's texts contain more explicit commonsense assertions, and finds that they do, leading to significant improvements in language-model-based extraction tasks with fine-tuning on small amounts of such texts.

Compiling comprehensive repositories of commonsense knowledge is a long-standing problem in AI. Many concerns revolve around the issue of reporting bias, i.e., that frequency in text sources is not a good proxy for relevance or truth. This paper explores whether children's texts hold the key to commonsense knowledge compilation, based on the hypothesis that such content makes fewer assumptions on the reader's knowledge, and therefore spells out commonsense more explicitly. An analysis with several corpora shows that children's texts indeed contain much more, and more typical commonsense assertions. Moreover, experiments show that this advantage can be leveraged in popular language-model-based commonsense knowledge extraction settings, where task-unspecific fine-tuning on small amounts of children texts (childBERT) already yields significant improvements. This provides a refreshing perspective different from the common trend of deriving progress from ever larger models and corpora.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes