LGAIOct 16, 2023

Reading Books is Great, But Not if You Are Driving! Visually Grounded Reasoning about Defeasible Commonsense Norms

AI2Stanford
arXiv:2310.10418v2135 citationsh-index: 31
Originality Incremental advance
AI Analysis

This addresses a challenge in AI for improving machine understanding of context-dependent norms in embodied scenarios, though it is incremental as it builds on existing multimodal and commonsense reasoning work.

The paper tackles the problem of visually grounded reasoning about defeasible commonsense norms, such as understanding that reading books is not great when driving, by constructing the NORMLENS benchmark with 10K human judgments and 2K multimodal situations, and finds that state-of-the-art models are not well-aligned with human judgments and explanations.

Commonsense norms are defeasible by context: reading books is usually great, but not when driving a car. While contexts can be explicitly described in language, in embodied scenarios, contexts are often provided visually. This type of visually grounded reasoning about defeasible commonsense norms is generally easy for humans, but (as we show) poses a challenge for machines, as it necessitates both visual understanding and reasoning about commonsense norms. We construct a new multimodal benchmark for studying visual-grounded commonsense norms: NORMLENS. NORMLENS consists of 10K human judgments accompanied by free-form explanations covering 2K multimodal situations, and serves as a probe to address two questions: (1) to what extent can models align with average human judgment? and (2) how well can models explain their predicted judgments? We find that state-of-the-art model judgments and explanations are not well-aligned with human annotation. Additionally, we present a new approach to better align models with humans by distilling social commonsense knowledge from large language models. The data and code are released at https://seungjuhan.me/normlens.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes