CLCVMay 19, 2016

Stereotyping and Bias in the Flickr30K Dataset

arXiv:1605.06083v198 citationsHas Code
Originality Synthesis-oriented
AI Analysis

This work addresses dataset bias issues for researchers and practitioners in computer vision and NLP, but is incremental as it builds on prior critiques of crowdsourced data.

The paper challenges the assumption that Flickr30K dataset descriptions are image-only by identifying biases and unwarranted inferences, and discusses detection methods and handling of stereotype-driven descriptions.

An untested assumption behind the crowdsourced descriptions of the images in the Flickr30K dataset (Young et al., 2014) is that they "focus only on the information that can be obtained from the image alone" (Hodosh et al., 2013, p. 859). This paper presents some evidence against this assumption, and provides a list of biases and unwarranted inferences that can be found in the Flickr30K dataset. Finally, it considers methods to find examples of these, and discusses how we should deal with stereotype-driven descriptions in future applications.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes