V. Lempitsky

CVSep 10, 2017

E. Ustinova, V. Lempitsky

Face verification and recognition problems have seen rapid progress in recent years, however recognition from small size images remains a challenging task that is inherently intertwined with the task of face super-resolution. Tackling this problem using multiple frames is an attractive idea, yet requires solving the alignment problem that is also challenging for low-resolution faces. Here we present a holistic system for multi-frame recognition, alignment, and superresolution of faces. Our neural network architecture restores the central frame of each input sequence additionally taking into account a number of adjacent frames and making use of sub-pixel movements. We present our results using the popular dataset for video face recognition (YouTube Faces). We show a notable improvement of identification score compared to several baselines including the one based on single-image super-resolution.

CVDec 22, 2016

Set2Model Networks: Learning Discriminatively To Learn Generative Models

A. Vakhitov, A. Kuzmin, V. Lempitsky

We present a new "learning-to-learn"-type approach that enables rapid learning of concepts from small-to-medium sized training sets and is primarily designed for web-initialized image retrieval. At the core of our approach is a deep architecture (a Set2Model network) that maps sets of examples to simple generative probabilistic models such as Gaussians or mixtures of Gaussians in the space of high-dimensional descriptors. The parameters of the embedding into the descriptor space are trained in the end-to-end fashion in the meta-learning stage using a set of training learning problems. The main technical novelty of our approach is the derivation of the backprop process through the mixture model fitting, which makes the likelihood of the resulting models differentiable with respect to the positions of the input descriptors. While the meta-learning process for a Set2Model network is discriminative, a trained Set2Model network performs generative learning of generative models in the descriptor space, which facilitates learning in the cases when no negative examples are available, and whenever the concept being learned is polysemous or represented by noisy training sets. Among other experiments, we demonstrate that these properties allow Set2Model networks to pick visual concepts from the raw outputs of Internet image search engines better than a set of strong baselines.

V. Lempitsky

2 Papers