AIMar 1

MMCOMET: A Large-Scale Multimodal Commonsense Knowledge Graph for Contextual Reasoning

arXiv:2603.01055v11 citationsh-index: 8
Originality Incremental advance
AI Analysis

This addresses a major limitation for researchers and practitioners in AI by providing a foundational resource for multimodal commonsense reasoning and narrative generation, though it is incremental as it extends an existing knowledge graph.

The authors tackled the problem of limited multimodal commonsense knowledge for complex reasoning by creating MMCOMET, a large-scale multimodal commonsense knowledge graph with over 900K triples, which improved visual storytelling by generating richer and more coherent stories compared to text-only methods.

We present MMCOMET, the first multimodal commonsense knowledge graph (MMKG) that integrates physical, social, and eventive knowledge. MMCOMET extends the ATOMIC2020 knowledge graph to include a visual dimension, through an efficient image retrieval process, resulting in over 900K multimodal triples. This new resource addresses a major limitation of existing MMKGs in supporting complex reasoning tasks like image captioning and storytelling. Through a standard visual storytelling experiment, we show that our holistic approach enables the generation of richer, coherent, and contextually grounded stories than those produced using text-only knowledge. This resource establishes a new foundation for multimodal commonsense reasoning and narrative generation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes