CVAIApr 12, 2025

Visual moral inference and communication

U of Toronto
arXiv:2504.11473v11 citationsh-index: 3CogSci
AI Analysis

This work addresses the challenge of multimodal moral inference for AI systems, enabling automation and pattern discovery in visual moral communication, though it is incremental as it builds on existing language-vision fusion methods.

The paper tackled the problem of automated moral inference from images, which typically relies on textual input, by developing a computational framework for moral inference from natural images, showing that language-vision fusion models improve precision over text-only models in capturing human moral judgments. Applications to news data revealed implicit biases in categories and geopolitical discussions.

Humans can make moral inferences from multiple sources of input. In contrast, automated moral inference in artificial intelligence typically relies on language models with textual input. However, morality is conveyed through modalities beyond language. We present a computational framework that supports moral inference from natural images, demonstrated in two related tasks: 1) inferring human moral judgment toward visual images and 2) analyzing patterns in moral content communicated via images from public news. We find that models based on text alone cannot capture the fine-grained human moral judgment toward visual stimuli, but language-vision fusion models offer better precision in visual moral inference. Furthermore, applications of our framework to news data reveal implicit biases in news categories and geopolitical discussions. Our work creates avenues for automating visual moral inference and discovering patterns of visual moral communication in public media.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes