CVAug 12, 2018

Language Guided Fashion Image Manipulation with Feature-wise Transformations

arXiv:1808.04000v127 citations
Originality Incremental advance
AI Analysis

This work addresses a challenging task in fashion and design by enabling precise image manipulation through language, though it appears incremental as it builds on existing methods like FiLM and GANs.

The paper tackled the problem of editing fashion images based on natural language descriptions by proposing FiLMedGAN, which uses feature-wise linear modulation to transform visual features without extra spatial information, resulting in more plausible and better-localized outfit generation compared to baselines.

Developing techniques for editing an outfit image through natural sentences and accordingly generating new outfits has promising applications for art, fashion and design. However, it is considered as a certainly challenging task since image manipulation should be carried out only on the relevant parts of the image while keeping the remaining sections untouched. Moreover, this manipulation process should generate an image that is as realistic as possible. In this work, we propose FiLMedGAN, which leverages feature-wise linear modulation (FiLM) to relate and transform visual features with natural language representations without using extra spatial information. Our experiments demonstrate that this approach, when combined with skip connections and total variation regularization, produces more plausible results than the baseline work, and has a better localization capability when generating new outfits consistent with the target description.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes