Towards Better Adversarial Synthesis of Human Images from Text
This addresses the problem of generating realistic human images from text for computer vision applications, but it is incremental as it builds on existing SMPL and synthesis frameworks.
The paper tackles generating 3D human meshes from text, using the SMPL model and evaluating on the COCO dataset to capture scene dynamics and human interactions, and shows this improves image synthesis by constraining networks to produce realistic human shapes.
This paper proposes an approach that generates multiple 3D human meshes from text. The human shapes are represented by 3D meshes based on the SMPL model. The model's performance is evaluated on the COCO dataset, which contains challenging human shapes and intricate interactions between individuals. The model is able to capture the dynamics of the scene and the interactions between individuals based on text. We further show how using such a shape as input to image synthesis frameworks helps to constrain the network to synthesize humans with realistic human shapes.