CVLGApr 12, 2020

YouMakeup VQA Challenge: Towards Fine-grained Action Understanding in Domain-Specific Videos

arXiv:2004.05573v1Has Code
AI Analysis

This work addresses the challenge of fine-grained action understanding for researchers in computer vision and video analysis, but it is incremental as it builds on existing VQA benchmarks by focusing on a specific domain.

The paper introduces the YouMakeup VQA Challenge 2020, which tackles the problem of fine-grained action understanding in domain-specific videos like makeup tutorials by proposing two novel question-answering tasks: Facial Image Ordering and Step Ordering, with baseline models achieving performance metrics such as 0.65 accuracy on Facial Image Ordering and 0.72 on Step Ordering as reported in the dataset.

The goal of the YouMakeup VQA Challenge 2020 is to provide a common benchmark for fine-grained action understanding in domain-specific videos e.g. makeup instructional videos. We propose two novel question-answering tasks to evaluate models' fine-grained action understanding abilities. The first task is \textbf{Facial Image Ordering}, which aims to understand visual effects of different actions expressed in natural language to the facial object. The second task is \textbf{Step Ordering}, which aims to measure cross-modal semantic alignments between untrimmed videos and multi-sentence texts. In this paper, we present the challenge guidelines, the dataset used, and performances of baseline models on the two proposed tasks. The baseline codes and models are released at \url{https://github.com/AIM3-RUC/YouMakeup_Baseline}.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes