CVNov 30, 2018
The GAN that Warped: Semantic Attribute Editing with Unpaired DataGaroe Dorta, Sara Vicente, Neill D. F. Campbell et al.
Deep neural networks have recently been used to edit images with great success, in particular for faces. However, they are often limited to only being able to work at a restricted range of resolutions. Many methods are so flexible that face edits can often result in an unwanted loss of identity. This work proposes to learn how to perform semantic image edits through the application of smooth warp fields. Previous approaches that attempted to use warping for semantic edits required paired data, i.e. example images of the same subject with different semantic attributes. In contrast, we employ recent advances in Generative Adversarial Networks that allow our model to be trained with unpaired data. We demonstrate face editing at very high resolutions (4k images) with a single forward pass of a deep network at a lower resolution. We also show that our edits are substantially better at preserving the subject's identity. The robustness of our approach is demonstrated by showing plausible image editing results on the Cub200 birds dataset. To our knowledge this has not been previously accomplished, due the challenging nature of the dataset.
CVApr 14, 2018
Physics-driven Fire Modeling from Multi-view ImagesGaroe Dorta, Luca Benedetti, Dmitry Kit et al.
Fire effects are widely used in various computer graphics applications such as visual effects and video games. Modeling the shape and appearance of fire phenomenon is challenging as the underlying effects are driven by complex laws of physics. State-of-the-art fire modeling techniques rely on sophisticated physical simulations which require intensive parameter tuning, or use simplifications which produce physically invalid results. In this paper, we present a novel method of reconstructing physically valid fire models from multi-view stereo images. Our method, for the first time, provides plausible estimation of physical properties (e.g., temperature, density) of a fire volume using RGB cameras. This allows for a number of novel phenomena such as global fire illumination effects. The effectiveness and usefulness of our method are tested by generating fire models from a variety of input data, and applying the reconstructed fire models for realistic illumination of virtual scenes.
MLApr 3, 2018
Training VAEs Under Structured ResidualsGaroe Dorta, Sara Vicente, Lourdes Agapito et al.
Variational auto-encoders (VAEs) are a popular and powerful deep generative model. Previous works on VAEs have assumed a factorized likelihood model, whereby the output uncertainty of each pixel is assumed to be independent. This approximation is clearly limited as demonstrated by observing a residual image from a VAE reconstruction, which often possess a high level of structure. This paper demonstrates a novel scheme to incorporate a structured Gaussian likelihood prediction network within the VAE that allows the residual correlations to be modeled. Our novel architecture, with minimal increase in complexity, incorporates the covariance matrix prediction within the VAE. We also propose a new mechanism for allowing structured uncertainty on color images. Furthermore, we provide a scheme for effectively training this model, and include some suggestions for improving performance in terms of efficiency or modeling longer range correlations.
MLFeb 20, 2018
Structured Uncertainty Prediction NetworksGaroe Dorta, Sara Vicente, Lourdes Agapito et al.
This paper is the first work to propose a network to predict a structured uncertainty distribution for a synthesized image. Previous approaches have been mostly limited to predicting diagonal covariance matrices. Our novel model learns to predict a full Gaussian covariance matrix for each reconstruction, which permits efficient sampling and likelihood evaluation. We demonstrate that our model can accurately reconstruct ground truth correlated residual distributions for synthetic datasets and generate plausible high frequency samples for real face images. We also illustrate the use of these predicted covariances for structure preserving image denoising.