CVApr 19, 2023

HyperStyle3D: Text-Guided 3D Portrait Stylization via Hypernetworks

arXiv:2304.09463v116 citationsh-index: 49
AI Analysis

This addresses the need for 3D content in applications like metaverse and games, offering a solution that bypasses expensive 3D data acquisition, though it builds incrementally on existing 3D-aware GANs.

The paper tackles 3D portrait stylization without requiring costly 3D data by proposing HyperStyle3D, a method based on 3D-aware GANs with a hyper-network for parameter manipulation and CLIP for text guidance, achieving 3D-consistent rendering across diverse styles, shape deformation, and attribute editing.

Portrait stylization is a long-standing task enabling extensive applications. Although 2D-based methods have made great progress in recent years, real-world applications such as metaverse and games often demand 3D content. On the other hand, the requirement of 3D data, which is costly to acquire, significantly impedes the development of 3D portrait stylization methods. In this paper, inspired by the success of 3D-aware GANs that bridge 2D and 3D domains with 3D fields as the intermediate representation for rendering 2D images, we propose a novel method, dubbed HyperStyle3D, based on 3D-aware GANs for 3D portrait stylization. At the core of our method is a hyper-network learned to manipulate the parameters of the generator in a single forward pass. It not only offers a strong capacity to handle multiple styles with a single model, but also enables flexible fine-grained stylization that affects only texture, shape, or local part of the portrait. While the use of 3D-aware GANs bypasses the requirement of 3D data, we further alleviate the necessity of style images with the CLIP model being the stylization guidance. We conduct an extensive set of experiments across the style, attribute, and shape, and meanwhile, measure the 3D consistency. These experiments demonstrate the superior capability of our HyperStyle3D model in rendering 3D-consistent images in diverse styles, deforming the face shape, and editing various attributes.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes