CLAIOct 14, 2022

Extracting Cultural Commonsense Knowledge at Scale

arXiv:2210.07763v3104 citationsh-index: 96
Originality Incremental advance
AI Analysis

This addresses the problem of situative AI needing socio-cultural context for human-centric applications, though it is incremental as it builds on existing knowledge extraction methods.

The paper tackles the lack of cultural commonsense knowledge in AI by introducing CANDLE, an end-to-end method that extracts and organizes high-quality cultural commonsense assertions from a large web corpus for domains like geography and religion, showing superiority over prior works and benefits for GPT-3.

Structured knowledge is important for many AI applications. Commonsense knowledge, which is crucial for robust human-centric AI, is covered by a small number of structured knowledge projects. However, they lack knowledge about human traits and behaviors conditioned on socio-cultural contexts, which is crucial for situative AI. This paper presents CANDLE, an end-to-end methodology for extracting high-quality cultural commonsense knowledge (CCSK) at scale. CANDLE extracts CCSK assertions from a huge web corpus and organizes them into coherent clusters, for 3 domains of subjects (geography, religion, occupation) and several cultural facets (food, drinks, clothing, traditions, rituals, behaviors). CANDLE includes judicious techniques for classification-based filtering and scoring of interestingness. Experimental evaluations show the superiority of the CANDLE CCSK collection over prior works, and an extrinsic use case demonstrates the benefits of CCSK for the GPT-3 language model. Code and data can be accessed at https://candle.mpi-inf.mpg.de/.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes