Kazunori Miyata

h-index22

9papers

101citations

Novelty44%

AI Score28

Ranked #150,429 of 194,257 authors (top 77%)#146 in GR (top 84%)

9 Papers

8.4CVJun 13, 2023

AniFaceDrawing: Anime Portrait Exploration during Your Sketching

Zhengyu Huang, Haoran Xie, Tsukasa Fukusato et al.

In this paper, we focus on how artificial intelligence (AI) can be used to assist users in the creation of anime portraits, that is, converting rough sketches into anime portraits during their sketching process. The input is a sequence of incomplete freehand sketches that are gradually refined stroke by stroke, while the output is a sequence of high-quality anime portraits that correspond to the input sketches as guidance. Although recent GANs can generate high quality images, it is a challenging problem to maintain the high quality of generated images from sketches with a low degree of completion due to ill-posed problems in conditional image generation. Even with the latest sketch-to-image (S2I) technology, it is still difficult to create high-quality images from incomplete rough sketches for anime portraits since anime style tend to be more abstract than in realistic style. To address this issue, we adopt a latent space exploration of StyleGAN with a two-stage training strategy. We consider the input strokes of a freehand sketch to correspond to edge information-related attributes in the latent structural code of StyleGAN, and term the matching between strokes and these attributes stroke-level disentanglement. In the first stage, we trained an image encoder with the pre-trained StyleGAN model as a teacher encoder. In the second stage, we simulated the drawing process of the generated images without any additional data (labels) and trained the sketch encoder for incomplete progressive sketches to generate high-quality portrait images with feature alignment to the disentangled representations in the teacher encoder. We verified the proposed progressive S2I system with both qualitative and quantitative evaluations and achieved high-quality anime portraits from incomplete progressive sketches. Our user study proved its effectiveness in art creation assistance for the anime style.

6.8CVFeb 14, 2023Code

DiffFaceSketch: High-Fidelity Face Image Synthesis with Sketch-Guided Latent Diffusion Model

Yichen Peng, Chunqi Zhao, Haoran Xie et al.

Synthesizing face images from monochrome sketches is one of the most fundamental tasks in the field of image-to-image translation. However, it is still challenging to (1)~make models learn the high-dimensional face features such as geometry and color, and (2)~take into account the characteristics of input sketches. Existing methods often use sketches as indirect inputs (or as auxiliary inputs) to guide the models, resulting in the loss of sketch features or the alteration of geometry information. In this paper, we introduce a Sketch-Guided Latent Diffusion Model (SGLDM), an LDM-based network architect trained on the paired sketch-face dataset. We apply a Multi-Auto-Encoder (AE) to encode the different input sketches from different regions of a face from pixel space to a feature map in latent space, which enables us to reduce the dimension of the sketch input while preserving the geometry-related information of local face details. We build a sketch-face paired dataset based on the existing method that extracts the edge map from an image. We then introduce a Stochastic Region Abstraction (SRA), an approach to augment our dataset to improve the robustness of SGLDM to handle sketch input with arbitrary abstraction. The evaluation study shows that SGLDM can synthesize high-quality face images with different expressions, facial accessories, and hairstyles from various sketches with different abstraction levels.

5.1GRMar 1, 2023

Sketch2Cloth: Sketch-based 3D Garment Generation with Unsigned Distance Fields

Yi He, Haoran Xie, Kazunori Miyata

3D model reconstruction from a single image has achieved great progress with the recent deep generative models. However, the conventional reconstruction approaches with template mesh deformation and implicit fields have difficulty in reconstructing non-watertight 3D mesh models, such as garments. In contrast to image-based modeling, the sketch-based approach can help users generate 3D models to meet the design intentions from hand-drawn sketches. In this study, we propose Sketch2Cloth, a sketch-based 3D garment generation system using the unsigned distance fields from the user's sketch input. Sketch2Cloth first estimates the unsigned distance function of the target 3D model from the sketch input, and extracts the mesh from the estimated field with Marching Cubes. We also provide the model editing function to modify the generated mesh. We verified the proposed Sketch2Cloth with quantitative evaluations on garment generation and editing with a state-of-the-art approach.

9.6CVMar 27, 2024

ECNet: Effective Controllable Text-to-Image Diffusion Models

Sicheng Li, Keqiang Sun, Zhixin Lai et al.

The conditional text-to-image diffusion models have garnered significant attention in recent years. However, the precision of these models is often compromised mainly for two reasons, ambiguous condition input and inadequate condition guidance over single denoising loss. To address the challenges, we introduce two innovative solutions. Firstly, we propose a Spatial Guidance Injector (SGI) which enhances conditional detail by encoding text inputs with precise annotation information. This method directly tackles the issue of ambiguous control inputs by providing clear, annotated guidance to the model. Secondly, to overcome the issue of limited conditional supervision, we introduce Diffusion Consistency Loss (DCL), which applies supervision on the denoised latent code at any given time step. This encourages consistency between the latent code at each time step and the input signal, thereby enhancing the robustness and accuracy of the output. The combination of SGI and DCL results in our Effective Controllable Network (ECNet), which offers a more accurate controllable end-to-end text-to-image generation framework with a more precise conditioning input and stronger controllable supervision. We validate our approach through extensive experiments on generation under various conditions, such as human body skeletons, facial landmarks, and sketches of general objects. The results consistently demonstrate that our method significantly enhances the controllability and robustness of the generated images, outperforming existing state-of-the-art controllable text-to-image models.

6.4HCSep 7, 2021

SketchMeHow: Interactive Projection Guided Task Instruction with User Sketches

Haoran Xie, Yichen Peng, Hange Wang et al.

In this work, we propose an interactive general instruction framework SketchMeHow to guidance the common users to complete the daily tasks in real-time. In contrast to the conventional augmented reality-based instruction systems, the proposed framework utilizes the user sketches as system inputs to acquire the users' production intentions from the drawing interfaces. Given the user sketches, the designated task instruction can be analyzed based on the sub-task division and spatial localization for each task. The projector-camera system is adopted in the projection guidance to the end-users with the spatial augmented reality technology. To verify the proposed framework, we conducted two case studies of domino arrangement and bento production. From our user studies, the proposed systems can help novice users complete the tasks efficiently with user satisfaction. We believe the proposed SketchMeHow can broaden the research topics in sketch-based real-world applications in human-computer interaction.

1.2GRAug 10, 2021

Stroke Correspondence by Labeling Closed Areas

Ryoma Miyauchi, Tsukasa Fukusato, Haoran Xie et al.

Constructing stroke correspondences between keyframes is one of the most important processes in the production pipeline of hand-drawn inbetweening frames. This process requires time-consuming manual work imposing a tremendous burden on the animators. We propose a method to estimate stroke correspondences between raster character images (keyframes) without vectorization processes. First, the proposed system separates the closed areas in each keyframe and estimates the correspondences between closed areas by using the characteristics of shape, depth, and closed area connection. Second, the proposed system estimates stroke correspondences from the estimated closed area correspondences. We demonstrate the effectiveness of our method by performing a user study and comparing the proposed system with conventional approaches.

1.2GRJun 17, 2021

Learning Perceptual Manifold of Fonts

Haoran Xie, Yuki Fujita, Kazunori Miyata

Along the rapid development of deep learning techniques in generative models, it is becoming an urgent issue to combine machine intelligence with human intelligence to solve the practical applications. Motivated by this methodology, this work aims to adjust the machine generated character fonts with the effort of human workers in the perception study. Although numerous fonts are available online for public usage, it is difficult and challenging to generate and explore a font to meet the preferences for common users. To solve the specific issue, we propose the perceptual manifold of fonts to visualize the perceptual adjustment in the latent space of a generative model of fonts. In our framework, we adopt the variational autoencoder network for the font generation. Then, we conduct a perceptual study on the generated fonts from the multi-dimensional latent space of the generative model. After we obtained the distribution data of specific preferences, we utilize manifold learning approach to visualize the font distribution. In contrast to the conventional user interface in our user study, the proposed font-exploring user interface is efficient and helpful in the designated user preference.

3.7HCJun 17, 2021

CoreUI: Interactive Core Training System with 3D Human Shape

Haoran Xie, Atsushi Watatani, Kazunori Miyata

We present an interactive core training system for core training using a monocular camera image as input in this paper. It is commonly expensive to capture human pose using depth cameras or multiple cameras with conventional approaches. To solve this issue, we employ the skinned multi-person linear model of human shape to recover the 3D human pose from 2D images using pose estimation and human mesh recovery approaches. In order to support the user in maintaining the correct postures from target poses in the training, we adopt 3D human shape estimation for both the target image and input camera video. We propose CoreUI, a user interface for providing visual guidance showing the differences among the estimated targets and current human shapes in core training, which are visualized by markers at ten body parts with color changes. From our user studies, the proposed core training system is effective and convenient compared with the conventional guidance of 2D skeletons.

9.2GRApr 26, 2021Code

dualFace:Two-Stage Drawing Guidance for Freehand Portrait Sketching

Zhengyu Huang, Yichen Peng, Tomohiro Hibino et al.

In this paper, we propose dualFace, a portrait drawing interface to assist users with different levels of drawing skills to complete recognizable and authentic face sketches. dualFace consists of two-stage drawing assistance to provide global and local visual guidance: global guidance, which helps users draw contour lines of portraits (i.e., geometric structure), and local guidance, which helps users draws details of facial parts (which conform to user-drawn contour lines), inspired by traditional artist workflows in portrait drawing. In the stage of global guidance, the user draws several contour lines, and dualFace then searches several relevant images from an internal database and displays the suggested face contour lines over the background of the canvas. In the stage of local guidance, we synthesize detailed portrait images with a deep generative model from user-drawn contour lines, but use the synthesized results as detailed drawing guidance. We conducted a user study to verify the effectiveness of dualFace, and we confirmed that dualFace significantly helps achieve a detailed portrait sketch. see http://www.jaist.ac.jp/~xie/dualface.html