CVAug 17, 2023

ICAR: Image-based Complementary Auto Reasoning

arXiv:2308.09119v11 citationsh-index: 10
Originality Incremental advance
AI Analysis

This work addresses the problem of generating compatible item sets across domains like fashion and furniture, which is incremental as it builds on existing CIR methods with a novel framework.

The paper tackles the challenging task of Scene-aware Complementary Item Retrieval (CIR) by proposing a visual compatibility concept and a Flexible Bidirectional Transformer (FBT) framework, achieving up to 5.3% and 9.6% improvements in FITB score and 22.3% and 31.8% in SFID on fashion and furniture datasets compared to state-of-the-art methods.

Scene-aware Complementary Item Retrieval (CIR) is a challenging task which requires to generate a set of compatible items across domains. Due to the subjectivity, it is difficult to set up a rigorous standard for both data collection and learning objectives. To address this challenging task, we propose a visual compatibility concept, composed of similarity (resembling in color, geometry, texture, and etc.) and complementarity (different items like table vs chair completing a group). Based on this notion, we propose a compatibility learning framework, a category-aware Flexible Bidirectional Transformer (FBT), for visual "scene-based set compatibility reasoning" with the cross-domain visual similarity input and auto-regressive complementary item generation. We introduce a "Flexible Bidirectional Transformer (FBT)" consisting of an encoder with flexible masking, a category prediction arm, and an auto-regressive visual embedding prediction arm. And the inputs for FBT are cross-domain visual similarity invariant embeddings, making this framework quite generalizable. Furthermore, our proposed FBT model learns the inter-object compatibility from a large set of scene images in a self-supervised way. Compared with the SOTA methods, this approach achieves up to 5.3% and 9.6% in FITB score and 22.3% and 31.8% SFID improvement on fashion and furniture, respectively.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes