CVDec 18, 2018
Learning Direct Optimization for Scene UnderstandingLukasz Romaszko, Christopher K. I. Williams, John Winn
We develop a Learning Direct Optimization (LiDO) method for the refinement of a latent variable model that describes input image x. Our goal is to explain a single image x with an interpretable 3D computer graphics model having scene graph latent variables z (such as object appearance, camera position). Given a current estimate of z we can render a prediction of the image g(z), which can be compared to the image x. The standard way to proceed is then to measure the error E(x, g(z)) between the two, and use an optimizer to minimize the error. However, it is unknown which error measure E would be most effective for simultaneously addressing issues such as misaligned objects, occlusions, textures, etc. In contrast, the LiDO approach trains a Prediction Network to predict an update directly to correct z, rather than minimizing the error with respect to z. Experiments show that our LiDO method converges rapidly as it does not need to perform a search on the error landscape, produces better solutions than error-based competitors, and is able to handle the mismatch between the data and the fitted scene model. We apply LiDO to a realistic synthetic dataset, and show that the method also transfers to work well with real images.
MLNov 7, 2016
Gaussian Attention Model and Its Application to Knowledge Base Embedding and Question AnsweringLiwen Zhang, John Winn, Ryota Tomioka
We propose the Gaussian attention model for content-based neural memory access. With the proposed attention model, a neural network has the additional degree of freedom to control the focus of its attention from a laser sharp attention to a broad attention. It is applicable whenever we can assume that the distance in the latent space reflects some notion of semantics. We use the proposed attention model as a scoring function for the embedding of a knowledge base into a continuous vector space and then train a model that performs question answering about the entities in the knowledge base. The proposed attention model can handle both the propagation of uncertainty when following a series of relations and also the conjunction of conditions in a natural way. On a dataset of soccer players who participated in the FIFA World Cup 2014, we demonstrate that our model can handle both path queries and conjunctive queries well.
CVOct 27, 2014
Consensus Message Passing for Layered Graphical ModelsVarun Jampani, S. M. Ali Eslami, Daniel Tarlow et al.
Generative models provide a powerful framework for probabilistic reasoning. However, in many domains their use has been hampered by the practical difficulties of inference. This is particularly the case in computer vision, where models of the imaging process tend to be large, loopy and layered. For this reason bottom-up conditional models have traditionally dominated in such domains. We find that widely-used, general-purpose message passing inference algorithms such as Expectation Propagation (EP) and Variational Message Passing (VMP) fail on the simplest of vision models. With these models in mind, we introduce a modification to message passing that learns to exploit their layered structure by passing 'consensus' messages that guide inference towards good solutions. Experiments on a variety of problems show that the proposed technique leads to significantly more accurate inference results, not only when compared to standard EP and VMP, but also when compared to competitive bottom-up conditional models.