CLAICVJul 6, 2017

Cross-linguistic differences and similarities in image descriptions

arXiv:1707.01736v21095 citations
AI Analysis

This addresses the problem of dataset bias in multilingual image description systems for NLP researchers, but it is incremental as it extends existing work to new languages without major methodological breakthroughs.

The paper compared Dutch, English, and German image description datasets to explore cross-linguistic differences, finding that descriptions are largely similar but description specificity is influenced by crowd workers' familiarity with image subjects.

Automatic image description systems are commonly trained and evaluated on large image description datasets. Recently, researchers have started to collect such datasets for languages other than English. An unexplored question is how different these datasets are from English and, if there are any differences, what causes them to differ. This paper provides a cross-linguistic comparison of Dutch, English, and German image descriptions. We find that these descriptions are similar in many respects, but the familiarity of crowd workers with the subjects of the images has a noticeable influence on description specificity.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes