CVDec 19, 2022

MetaCLUE: Towards Comprehensive Visual Metaphors Research

DeepMindIBM
arXiv:2212.09898v347 citationsh-index: 76
Originality Incremental advance
AI Analysis

This work addresses the problem of limited AI creative capabilities for researchers and developers by providing a foundational dataset and benchmarks for visual metaphor understanding, though it is incremental as it builds on existing vision-language methods.

The paper tackles the lack of research on visual metaphors in computer vision by introducing MetaCLUE, a set of tasks and a dataset with high-quality annotations for evaluating metaphorical comprehension in images. It analyzes state-of-the-art models, revealing their strengths and weaknesses in tasks like classification, localization, and generation.

Creativity is an indispensable part of human cognition and also an inherent part of how we make sense of the world. Metaphorical abstraction is fundamental in communicating creative ideas through nuanced relationships between abstract concepts such as feelings. While computer vision benchmarks and approaches predominantly focus on understanding and generating literal interpretations of images, metaphorical comprehension of images remains relatively unexplored. Towards this goal, we introduce MetaCLUE, a set of vision tasks on visual metaphor. We also collect high-quality and rich metaphor annotations (abstract objects, concepts, relationships along with their corresponding object boxes) as there do not exist any datasets that facilitate the evaluation of these tasks. We perform a comprehensive analysis of state-of-the-art models in vision and language based on our annotations, highlighting strengths and weaknesses of current approaches in visual metaphor Classification, Localization, Understanding (retrieval, question answering, captioning) and gEneration (text-to-image synthesis) tasks. We hope this work provides a concrete step towards developing AI systems with human-like creative capabilities.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes