Cross-Identity Motion Transfer for Arbitrary Objects through Pose-Attentive Video Reassembling
This addresses the problem of realistic motion transfer for arbitrary objects in video generation, though it appears to be an incremental improvement over existing attention-based approaches.
The paper tackles the problem of transferring motions between arbitrary objects using an attention-based network that reassembles source image pieces based on pose similarities, producing more realistic outputs than warping-based methods. Experimental results show the method produces visually pleasing results across various object domains with better performance than previous works.
We propose an attention-based networks for transferring motions between arbitrary objects. Given a source image(s) and a driving video, our networks animate the subject in the source images according to the motion in the driving video. In our attention mechanism, dense similarities between the learned keypoints in the source and the driving images are computed in order to retrieve the appearance information from the source images. Taking a different approach from the well-studied warping based models, our attention-based model has several advantages. By reassembling non-locally searched pieces from the source contents, our approach can produce more realistic outputs. Furthermore, our system can make use of multiple observations of the source appearance (e.g. front and sides of faces) to make the results more accurate. To reduce the training-testing discrepancy of the self-supervised learning, a novel cross-identity training scheme is additionally introduced. With the training scheme, our networks is trained to transfer motions between different subjects, as in the real testing scenario. Experimental results validate that our method produces visually pleasing results in various object domains, showing better performances compared to previous works.