CVDec 19, 2016

Semantic Jitter: Dense Supervision for Visual Comparisons via Synthetic Images

arXiv:1612.06341v2162 citations
AI Analysis

This addresses the sample sparsity issue in fine-grained visual comparison tasks, offering a practical solution for domains like facial analysis and fashion.

The paper tackles the problem of learning visual comparisons with limited training data by augmenting real image pairs with synthetically generated images that exhibit slight attribute modifications, resulting in improved attribute ranking models on faces and fashion datasets.

Distinguishing subtle differences in attributes is valuable, yet learning to make visual comparisons remains non-trivial. Not only is the number of possible comparisons quadratic in the number of training images, but also access to images adequately spanning the space of fine-grained visual differences is limited. We propose to overcome the sparsity of supervision problem via synthetically generated images. Building on a state-of-the-art image generation engine, we sample pairs of training images exhibiting slight modifications of individual attributes. Augmenting real training image pairs with these examples, we then train attribute ranking models to predict the relative strength of an attribute in novel pairs of real images. Our results on datasets of faces and fashion images show the great promise of bootstrapping imperfect image generators to counteract sample sparsity for learning to rank.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes