VIP: Finding Important People in Images
This addresses the need for better image analysis and description in contexts like event photography, but it is incremental as it builds on existing saliency and im2text methods.
The paper tackles the problem of identifying important individuals in group photographs and predicting which image depicts a person in the most important role, finding that visual cues can automatically predict importance and incorporating this improves applications like im2text.
People preserve memories of events such as birthdays, weddings, or vacations by capturing photos, often depicting groups of people. Invariably, some individuals in the image are more important than others given the context of the event. This paper analyzes the concept of the importance of individuals in group photographs. We address two specific questions -- Given an image, who are the most important individuals in it? Given multiple images of a person, which image depicts the person in the most important role? We introduce a measure of importance of people in images and investigate the correlation between importance and visual saliency. We find that not only can we automatically predict the importance of people from purely visual cues, incorporating this predicted importance results in significant improvement in applications such as im2text (generating sentences that describe images of groups of people).