Yuki Koyama

CV
h-index5
10papers
202citations
Novelty51%
AI Score42

10 Papers

CVNov 18, 2022Code
A Structure-Guided Diffusion Model for Large-Hole Image Completion

Daichi Horita, Jiaolong Yang, Dong Chen et al.

Image completion techniques have made significant progress in filling missing regions (i.e., holes) in images. However, large-hole completion remains challenging due to limited structural information. In this paper, we address this problem by integrating explicit structural guidance into diffusion-based image completion, forming our structure-guided diffusion model (SGDM). It consists of two cascaded diffusion probabilistic models: structure and texture generators. The structure generator generates an edge image representing plausible structures within the holes, which is then used for guiding the texture generation process. To train both generators jointly, we devise a novel strategy that leverages optimal Bayesian denoising, which denoises the output of the structure generator in a single step and thus allows backpropagation. Our diffusion-based approach enables a diversity of plausible completions, while the editable edges allow for editing parts of an image. Our experiments on natural scene (Places) and face (CelebA-HQ) datasets demonstrate that our method achieves a superior or comparable visual quality compared to state-of-the-art approaches. The code is available for research purposes at https://github.com/UdonDa/Structure_Guided_Diffusion_Model.

HCDec 16, 2025
LAPPI: Interactive Optimization with LLM-Assisted Preference-Based Problem Instantiation

So Kuroki, Manami Nakagawa, Shigeo Yoshida et al.

Many real-world tasks, such as trip planning or meal planning, can be formulated as combinatorial optimization problems. However, using optimization solvers is difficult for end users because it requires problem instantiation: defining candidate items, assigning preference scores, and specifying constraints. We introduce LAPPI (LLM-Assisted Preference-based Problem Instantiation), an interactive approach that uses large language models (LLMs) to support users in this instantiation process. Through natural language conversations, the system helps users transform vague preferences into well-defined optimization problems. These instantiated problems are then passed to existing optimization solvers to generate solutions. In a user study on trip planning, our method successfully captured user preferences and generated feasible plans that outperformed both conventional and prompt-engineering approaches. We further demonstrate LAPPI's versatility by adapting it to an additional use case.

CVMar 11, 2024
FontCLIP: A Semantic Typography Visual-Language Model for Multilingual Font Applications

Yuki Tatsukawa, I-Chao Shen, Anran Qi et al.

Acquiring the desired font for various design tasks can be challenging and requires professional typographic knowledge. While previous font retrieval or generation works have alleviated some of these difficulties, they often lack support for multiple languages and semantic attributes beyond the training data domains. To solve this problem, we present FontCLIP: a model that connects the semantic understanding of a large vision-language model with typographical knowledge. We integrate typography-specific knowledge into the comprehensive vision-language knowledge of a pretrained CLIP model through a novel finetuning approach. We propose to use a compound descriptive prompt that encapsulates adaptively sampled attributes from a font attribute dataset focusing on Roman alphabet characters. FontCLIP's semantic typographic latent space demonstrates two unprecedented generalization abilities. First, FontCLIP generalizes to different languages including Chinese, Japanese, and Korean (CJK), capturing the typographical features of fonts across different languages, even though it was only finetuned using fonts of Roman characters. Second, FontCLIP can recognize the semantic attributes that are not presented in the training data. FontCLIP's dual-modality and generalization abilities enable multilingual and cross-lingual font retrieval and letter shape optimization, reducing the burden of obtaining desired fonts.

HCAug 22, 2025
Cooperative Design Optimization through Natural Language Interaction

Ryogo Niwa, Shigeo Yoshida, Yuki Koyama et al.

Designing successful interactions requires identifying optimal design parameters. To do so, designers often conduct iterative user testing and exploratory trial-and-error. This involves balancing multiple objectives in a high-dimensional space, making the process time-consuming and cognitively demanding. System-led optimization methods, such as those based on Bayesian optimization, can determine for designers which parameters to test next. However, they offer limited opportunities for designers to intervene in the optimization process, negatively impacting the designer's experience. We propose a design optimization framework that enables natural language interactions between designers and the optimization system, facilitating cooperative design optimization. This is achieved by integrating system-led optimization methods with Large Language Models (LLMs), allowing designers to intervene in the optimization process and better understand the system's reasoning. Experimental results show that our method provides higher user agency than a system-led method and shows promising optimization performance compared to manual design. It also matches the performance of an existing cooperative method with lower cognitive load.

ROJul 26, 2021
Autonomous Coordinated Control of the Light Guide for Positioning in Vitreoretinal Surgery

Yuki Koyama, Murilo M. Marinho, Mamoru Mitsuishi et al.

Vitreoretinal surgery is challenging even for expert surgeons owing to the delicate target tissues and the diminutive workspace in the retina. In addition to improved dexterity and accuracy, robot assistance allows for (partial) task automation. In this work, we propose a strategy to automate the motion of the light guide with respect to the surgical instrument. This automation allows the instrument's shadow to always be inside the microscopic view, which is an important cue for the accurate positioning of the instrument in the retina. We show simulations and experiments demonstrating that the proposed strategy is effective in a 700-point grid in the retina of a surgical phantom. Furthermore, we integrated the proposed strategy with image processing and succeeded in positioning the surgical instrument's tip in the retina, relying on only the robot's geometric information and microscopic images.

LGMay 19, 2021
Tool- and Domain-Agnostic Parameterization of Style Transfer Effects Leveraging Pretrained Perceptual Metrics

Hiromu Yakura, Yuki Koyama, Masataka Goto

Current deep learning techniques for style transfer would not be optimal for design support since their "one-shot" transfer does not fit exploratory design processes. To overcome this gap, we propose parametric transcription, which transcribes an end-to-end style transfer effect into parameter values of specific transformations available in an existing content editing tool. With this approach, users can imitate the style of a reference sample in the tool that they are familiar with and thus can easily continue further exploration by manipulating the parameters. To enable this, we introduce a framework that utilizes an existing pretrained model for style transfer to calculate a perceptual style distance to the reference sample and uses black-box optimization to find the parameters that minimize this distance. Our experiments with various third-party tools, such as Instagram and Blender, show that our framework can effectively leverage deep learning techniques for computational design support.

SDOct 7, 2020
Generative Melody Composition with Human-in-the-Loop Bayesian Optimization

Yijun Zhou, Yuki Koyama, Masataka Goto et al.

Deep generative models allow even novice composers to generate various melodies by sampling latent vectors. However, finding the desired melody is challenging since the latent space is unintuitive and high-dimensional. In this work, we present an interactive system that supports generative melody composition with human-in-the-loop Bayesian optimization (BO). This system takes a mixed-initiative approach; the system generates candidate melodies to evaluate, and the user evaluates them and provides preferential feedback (i.e., picking the best melody among the candidates) to the system. This process is iteratively performed based on BO techniques until the user finds the desired melody. We conducted a pilot study using our prototype system, suggesting the potential of this approach.

GRMay 8, 2020
Sequential Gallery for Interactive Visual Design Optimization

Yuki Koyama, Issei Sato, Masataka Goto

Visual design tasks often involve tuning many design parameters. For example, color grading of a photograph involves many parameters, some of which non-expert users might be unfamiliar with. We propose a novel user-in-the-loop optimization method that allows users to efficiently find an appropriate parameter set by exploring such a high-dimensional design space through much easier two-dimensional search subtasks. This method, called sequential plane search, is based on Bayesian optimization to keep necessary queries to users as few as possible. To help users respond to plane-search queries, we also propose using a gallery-based interface that provides options in the two-dimensional subspace arranged in an adaptive grid view. We call this interactive framework Sequential Gallery since users sequentially select the best option from the options provided by the interface. Our experiment with synthetic functions shows that our sequential plane search can find satisfactory solutions in fewer iterations than baselines. We also conducted a preliminary user study, results of which suggest that novices can effectively complete search tasks with Sequential Gallery in a photo-enhancement scenario.

CVApr 8, 2020
MirrorNet: A Deep Bayesian Approach to Reflective 2D Pose Estimation from Human Images

Takayuki Nakatsuka, Kazuyoshi Yoshii, Yuki Koyama et al.

This paper proposes a statistical approach to 2D pose estimation from human images. The main problems with the standard supervised approach, which is based on a deep recognition (image-to-pose) model, are that it often yields anatomically implausible poses, and its performance is limited by the amount of paired data. To solve these problems, we propose a semi-supervised method that can make effective use of images with and without pose annotations. Specifically, we formulate a hierarchical generative model of poses and images by integrating a deep generative model of poses from pose features with that of images from poses and image features. We then introduce a deep recognition model that infers poses from images. Given images as observed data, these models can be trained jointly in a hierarchical variational autoencoding (image-to-pose-to-feature-to-pose-to-image) manner. The results of experiments show that the proposed reflective architecture makes estimated poses anatomically plausible, and the performance of pose estimation improved by integrating the recognition and generative models and also by feeding non-annotated images.

GRFeb 20, 2020
Computational Design with Crowds

Yuki Koyama, Takeo Igarashi

Computational design is aimed at supporting or automating design processes using computational techniques. However, some classes of design tasks involve criteria that are difficult to handle only with computers. For example, visual design tasks seeking to fulfill aesthetic goals are difficult to handle purely with computers. One promising approach is to leverage human computation; that is, to incorporate human input into the computation process. Crowdsourcing platforms provide a convenient way to integrate such human computation into a working system. In this chapter, we discuss such computational design with crowds in the domain of parameter tweaking tasks in visual design. Parameter tweaking is often performed to maximize the aesthetic quality of designed objects. Computational design powered by crowds can solve this maximization problem by leveraging human computation. We discuss the opportunities and challenges of computational design with crowds with two illustrative examples: (1) estimating the objective function (specifically, preference learning from crowds' pairwise comparisons) to facilitate interactive design exploration by a designer and (2) directly searching for the optimal parameter setting that maximizes the objective function (specifically, crowds-in-the-loop Bayesian optimization).