DETR-based Layered Clothing Segmentation and Fine-Grained Attribute Recognition
This work addresses challenges in fashion computer vision for applications like e-commerce, but it is incremental as it builds on existing DETR-based methods.
The paper tackles the problem of segmenting layered clothing and recognizing fine-grained attributes from human images, achieving state-of-the-art results on the Fashionpedia dataset.
Clothing segmentation and fine-grained attribute recognition are challenging tasks at the crossing of computer vision and fashion, which segment the entire ensemble clothing instances as well as recognize detailed attributes of the clothing products from any input human images. Many new models have been developed for the tasks in recent years, nevertheless the segmentation accuracy is less than satisfactory in case of layered clothing or fashion products in different scales. In this paper, a new DEtection TRansformer (DETR) based method is proposed to segment and recognize fine-grained attributes of ensemble clothing instances with high accuracy. In this model, we propose a \textbf{multi-layered attention module} by aggregating features of different scales, determining the various scale components of a single instance, and merging them together. We train our model on the Fashionpedia dataset and demonstrate our method surpasses SOTA models in tasks of layered clothing segmentation and fine-grained attribute recognition.