CVAIDec 29, 2025

Holi-DETR: Holistic Fashion Item Detection Leveraging Contextual Information

arXiv:2512.23221v1h-index: 3
Originality Incremental advance
AI Analysis

This work addresses fashion item detection for computer vision applications, representing an incremental advance by integrating contextual cues into existing transformer-based detectors.

The paper tackles the challenge of fashion item detection by proposing Holi-DETR, a novel method that leverages contextual information to reduce ambiguities, resulting in performance improvements of 3.6 percentage points over vanilla DETR and 1.1 percentage points over Co-DETR in average precision.

Fashion item detection is challenging due to the ambiguities introduced by the highly diverse appearances of fashion items and the similarities among item subcategories. To address this challenge, we propose a novel Holistic Detection Transformer (Holi-DETR) that detects fashion items in outfit images holistically, by leveraging contextual information. Fashion items often have meaningful relationships as they are combined to create specific styles. Unlike conventional detectors that detect each item independently, Holi-DETR detects multiple items while reducing ambiguities by leveraging three distinct types of contextual information: (1) the co-occurrence relationship between fashion items, (2) the relative position and size based on inter-item spatial arrangements, and (3) the spatial relationships between items and human body key-points. %Holi-DETR explicitly incorporates three types of contextual information: (1) the co-occurrence probability between fashion items, (2) the relative position and size based on inter-item spatial arrangements, and (3) the spatial relationships between items and human body key-points. To this end, we propose a novel architecture that integrates these three types of heterogeneous contextual information into the Detection Transformer (DETR) and its subsequent models. In experiments, the proposed methods improved the performance of the vanilla DETR and the more recently developed Co-DETR by 3.6 percent points (pp) and 1.1 pp, respectively, in terms of average precision (AP).

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes