IRAILGMMJul 21, 2022

Unimodal vs. Multimodal Siamese Networks for Outfit Completion

arXiv:2207.10355v11 citationsh-index: 5
Originality Synthesis-oriented
AI Analysis

This work addresses outfit recommendation for online fashion shoppers, but it is incremental as it builds on existing siamese network methods by adding multimodal data.

The paper tackled the Fill in the Blank task for fashion outfit completion by applying siamese networks and exploring multimodal (textual and visual) data integration, finding that combining both modalities yields promising results on test splits from the SIGIR 2022 eCommerce workshop challenge.

The popularity of online fashion shopping continues to grow. The ability to offer an effective recommendation to customers is becoming increasingly important. In this work, we focus on Fashion Outfits Challenge, part of SIGIR 2022 Workshop on eCommerce. The challenge is centered around Fill in the Blank (FITB) task that implies predicting the missing outfit, given an incomplete outfit and a list of candidates. In this paper, we focus on applying siamese networks on the task. More specifically, we explore how combining information from multiple modalities (textual and visual modality) impacts the performance of the model on the task. We evaluate our model on the test split provided by the challenge organizers and the test split with gold assignments that we created during the development phase. We discover that using both visual, and visual and textual data demonstrates promising results on the task. We conclude by suggesting directions for further improvement of our method.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes