An Empirical Analysis of GPT-4V's Performance on Fashion Aesthetic Evaluation
This addresses fashion aesthetic evaluation for AI applications, but it is incremental as it applies an existing model to a new domain.
The paper evaluated GPT-4V's zero-shot performance on fashion aesthetic evaluation, finding that its predictions align fairly well with human judgments but it struggles with ranking outfits in similar colors.
Fashion aesthetic evaluation is the task of estimating how well the outfits worn by individuals in images suit them. In this work, we examine the zero-shot performance of GPT-4V on this task for the first time. We show that its predictions align fairly well with human judgments on our datasets, and also find that it struggles with ranking outfits in similar colors. The code is available at https://github.com/st-tech/gpt4v-fashion-aesthetic-evaluation.