CLAICVMar 10, 2018

Face2Text: Collecting an Annotated Image Description Corpus for the Generation of Rich Face Descriptions

arXiv:1803.03827v21093 citations
Originality Synthesis-oriented
AI Analysis

This work addresses a domain-specific problem for researchers in vision-language tasks by providing a new dataset for face description generation, but it is incremental as it focuses on data collection rather than method development.

The authors tackled the lack of data for generating detailed face descriptions from images by collecting an annotated corpus through crowdsourcing, finding that descriptions include physical, emotional, and inferential attributes, which poses challenges for existing methods.

The past few years have witnessed renewed interest in NLP tasks at the interface between vision and language. One intensively-studied problem is that of automatically generating text from images. In this paper, we extend this problem to the more specific domain of face description. Unlike scene descriptions, face descriptions are more fine-grained and rely on attributes extracted from the image, rather than objects and relations. Given that no data exists for this task, we present an ongoing crowdsourcing study to collect a corpus of descriptions of face images taken `in the wild'. To gain a better understanding of the variation we find in face description and the possible issues that this may raise, we also conducted an annotation study on a subset of the corpus. Primarily, we found descriptions to refer to a mixture of attributes, not only physical, but also emotional and inferential, which is bound to create further challenges for current image-to-text methods.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes