Attention-based Fusion for Multi-source Human Image Generation
This work addresses the person-image generation problem for computer vision applications, offering an incremental improvement by extending single-source methods to handle multiple sources.
The paper tackles the problem of generating human images conditioned on target poses using multiple source appearance images, proposing a local attention mechanism that selects relevant information from different source regions without requiring specialized generators for different numbers of sources. The empirical evaluation demonstrates the practical value of this multi-source approach.
We present a generalization of the person-image generation task, in which a human image is generated conditioned on a target pose and a set X of source appearance images. In this way, we can exploit multiple, possibly complementary images of the same person which are usually available at training and at testing time. The solution we propose is mainly based on a local attention mechanism which selects relevant information from different source image regions, avoiding the necessity to build specific generators for each specific cardinality of X. The empirical evaluation of our method shows the practical interest of addressing the person-image generation problem in a multi-source setting.