CVOct 16, 2024

Context-Infused Visual Grounding for Art

arXiv:2410.12369v14 citationsh-index: 13>ECCV Workshops
Originality Incremental advance
AI Analysis

This addresses the challenge of localizing objects in art images for art historians and curators, representing an incremental improvement by adapting existing methods to a new domain.

The paper tackles the problem of visual grounding in artwork images, where existing methods trained on natural images perform poorly, by introducing CIGAr, which uses artwork descriptions as context during training, achieving new state-of-the-art object detection results on two artwork datasets.

Many artwork collections contain textual attributes that provide rich and contextualised descriptions of artworks. Visual grounding offers the potential for localising subjects within these descriptions on images, however, existing approaches are trained on natural images and generalise poorly to art. In this paper, we present CIGAr (Context-Infused GroundingDINO for Art), a visual grounding approach which utilises the artwork descriptions during training as context, thereby enabling visual grounding on art. In addition, we present a new dataset, Ukiyo-eVG, with manually annotated phrase-grounding annotations, and we set a new state-of-the-art for object detection on two artwork datasets.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes