Recycle-GAN: Unsupervised Video Retargeting
This addresses the problem of generating realistic video translations for applications like entertainment and simulation, though it appears incremental as it builds on existing GAN-based methods.
The paper tackles unsupervised video retargeting by translating content between domains while preserving style, using spatiotemporal constraints and adversarial losses. It demonstrates the approach on tasks like face-to-face translation and cloud synthesis, showing advantages over spatial-only methods.
We introduce a data-driven approach for unsupervised video retargeting that translates content from one domain to another while preserving the style native to a domain, i.e., if contents of John Oliver's speech were to be transferred to Stephen Colbert, then the generated content/speech should be in Stephen Colbert's style. Our approach combines both spatial and temporal information along with adversarial losses for content translation and style preservation. In this work, we first study the advantages of using spatiotemporal constraints over spatial constraints for effective retargeting. We then demonstrate the proposed approach for the problems where information in both space and time matters such as face-to-face translation, flower-to-flower, wind and cloud synthesis, sunrise and sunset.