CLCVIRApr 18, 2018

Quantifying the visual concreteness of words and topics in multimodal datasets

arXiv:1804.06786v21105 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of understanding concept learnability in multimodal machine learning, providing a tool for researchers, but it is incremental as it builds on prior insights about concreteness.

The paper tackles the problem of quantifying visual concreteness in multimodal datasets to assess how it affects learning visual-textual correspondences, finding that concrete concepts are easier to learn and algorithms have similar failure cases, with performance varying by dataset.

Multimodal machine learning algorithms aim to learn visual-textual correspondences. Previous work suggests that concepts with concrete visual manifestations may be easier to learn than concepts with abstract ones. We give an algorithm for automatically computing the visual concreteness of words and topics within multimodal datasets. We apply the approach in four settings, ranging from image captions to images/text scraped from historical books. In addition to enabling explorations of concepts in multimodal datasets, our concreteness scores predict the capacity of machine learning algorithms to learn textual/visual relationships. We find that 1) concrete concepts are indeed easier to learn; 2) the large number of algorithms we consider have similar failure cases; 3) the precise positive relationship between concreteness and performance varies between datasets. We conclude with recommendations for using concreteness scores to facilitate future multimodal research.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes