Chin-Yi Cheng

CV
h-index117
18papers
4,190citations
Novelty52%
AI Score40

18 Papers

CVJul 11, 2022Code
SkexGen: Autoregressive Generation of CAD Construction Sequences with Disentangled Codebooks

Xiang Xu, Karl D. D. Willis, Joseph G. Lambourne et al.

We present SkexGen, a novel autoregressive generative model for computer-aided design (CAD) construction sequences containing sketch-and-extrude modeling operations. Our model utilizes distinct Transformer architectures to encode topological, geometric, and extrusion variations of construction sequences into disentangled codebooks. Autoregressive Transformer decoders generate CAD construction sequences sharing certain properties specified by the codebook vectors. Extensive experiments demonstrate that our disentangled codebook representation generates diverse and high-quality CAD models, enhances user control, and enables efficient exploration of the design space. The code is available at https://samxuxiang.github.io/skexgen.

CVJul 25, 2022
Translating a Visual LEGO Manual to a Machine-Executable Plan

Ruocheng Wang, Yunzhi Zhang, Jiayuan Mao et al. · stanford

We study the problem of translating an image-based, step-by-step assembly manual created by human designers into machine-interpretable instructions. We formulate this problem as a sequential prediction task: at each step, our model reads the manual, locates the components to be added to the current shape, and infers their 3D poses. This task poses the challenge of establishing a 2D-3D correspondence between the manual image and the real 3D object, and 3D pose estimation for unseen 3D objects, since a new component to be added in a step can be an object built from previous steps. To address these two challenges, we present a novel learning-based framework, the Manual-to-Executable-Plan Network (MEPNet), which reconstructs the assembly steps from a sequence of manual images. The key idea is to integrate neural 2D keypoint detection modules and 2D-3D projection algorithms for high-precision prediction and strong generalization to unseen components. The MEPNet outperforms existing methods on three newly collected LEGO manual datasets and a Minecraft house dataset.

CVFeb 3, 2023
IKEA-Manual: Seeing Shape Assembly Step by Step

Ruocheng Wang, Yunzhi Zhang, Jiayuan Mao et al. · stanford

Human-designed visual manuals are crucial components in shape assembly activities. They provide step-by-step guidance on how we should move and connect different parts in a convenient and physically-realizable way. While there has been an ongoing effort in building agents that perform assembly tasks, the information in human-design manuals has been largely overlooked. We identify that this is due to 1) a lack of realistic 3D assembly objects that have paired manuals and 2) the difficulty of extracting structured information from purely image-based manuals. Motivated by this observation, we present IKEA-Manual, a dataset consisting of 102 IKEA objects paired with assembly manuals. We provide fine-grained annotations on the IKEA objects and assembly manuals, including decomposed assembly parts, assembly plans, manual segmentation, and 2D-3D correspondence between 3D parts and visual manuals. We illustrate the broad application of our dataset on four tasks related to shape assembly: assembly plan generation, part segmentation, pose estimation, and 3D part assembly.

LGJan 27, 2023
PLay: Parametrically Conditioned Layout Generation using Latent Diffusion

Chin-Yi Cheng, Forrest Huang, Gang Li et al.

Layout design is an important task in various design fields, including user interface, document, and graphic design. As this task requires tedious manual effort by designers, prior works have attempted to automate this process using generative models, but commonly fell short of providing intuitive user controls and achieving design objectives. In this paper, we build a conditional latent diffusion model, PLay, that generates parametrically conditioned layouts in vector graphic space from user-specified guidelines, which are commonly used by designers for representing their design intents in current practices. Our method outperforms prior works across three datasets on metrics including FID and FD-VG, and in user study. Moreover, it brings a novel and interactive experience to professional layout design processes.

CVNov 30, 2023
A-Scan2BIM: Assistive Scan to Building Information Modeling

Weilian Song, Jieliang Luo, Dale Zhao et al.

This paper proposes an assistive system for architects that converts a large-scale point cloud into a standardized digital representation of a building for Building Information Modeling (BIM) applications. The process is known as Scan-to-BIM, which requires many hours of manual work even for a single building floor by a professional architect. Given its challenging nature, the paper focuses on helping architects on the Scan-to-BIM process, instead of replacing them. Concretely, we propose an assistive Scan-to-BIM system that takes the raw sensor data and edit history (including the current BIM model), then auto-regressively predicts a sequence of model editing operations as APIs of a professional BIM software (i.e., Autodesk Revit). The paper also presents the first building-scale Scan2BIM dataset that contains a sequence of model editing operations as the APIs of Autodesk Revit. The dataset contains 89 hours of Scan2BIM modeling processes by professional architects over 16 scenes, spanning over 35,000 m^2. We report our system's reconstruction quality with standard metrics, and we introduce a novel metric that measures how natural the order of reconstructed operations is. A simple modification to the reconstruction module helps improve performance, and our method is far superior to two other baselines in the order metric. We will release data, code, and models at a-scan2bim.github.io.

CLJul 7, 2025
Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

Gheorghe Comanici, Eric Bieber, Mike Schaekermann et al. · amazon-science, baidu

In this report, we introduce the Gemini 2.X model family: Gemini 2.5 Pro and Gemini 2.5 Flash, as well as our earlier Gemini 2.0 Flash and Flash-Lite models. Gemini 2.5 Pro is our most capable model yet, achieving SoTA performance on frontier coding and reasoning benchmarks. In addition to its incredible coding and reasoning skills, Gemini 2.5 Pro is a thinking model that excels at multimodal understanding and it is now able to process up to 3 hours of video content. Its unique combination of long context, multimodal and reasoning capabilities can be combined to unlock new agentic workflows. Gemini 2.5 Flash provides excellent reasoning abilities at a fraction of the compute and latency requirements and Gemini 2.0 Flash and Flash-Lite provide high performance at low latency and cost. Taken together, the Gemini 2.X model generation spans the full Pareto frontier of model capability vs cost, allowing users to explore the boundaries of what is possible with complex agentic problem solving.

LGSep 5, 2023
Representation Learning for Sequential Volumetric Design Tasks

Md Ferdous Alam, Yi Wang, Chin-Yi Cheng et al.

Volumetric design, also called massing design, is the first and critical step in professional building design which is sequential in nature. As the volumetric design process requires careful design decisions and iterative adjustments, the underlying sequential design process encodes valuable information for designers. Many efforts have been made to automatically generate reasonable volumetric designs, but the quality of the generated design solutions varies, and evaluating a design solution requires either a prohibitively comprehensive set of metrics or expensive human expertise. While previous approaches focused on learning only the final design instead of sequential design tasks, we propose to encode the design knowledge from a collection of expert or high-performing design sequences and extract useful representations using transformer-based models. Later we propose to utilize the learned representations for crucial downstream applications such as design preference evaluation and procedural design generation. We develop the preference model by estimating the density of the learned representations whereas we train an autoregressive transformer model for sequential design generation. We demonstrate our ideas by leveraging a novel dataset of thousands of sequential volumetric designs. Our preference model can compare two arbitrarily given design sequences and is almost $90\%$ accurate in evaluation against random design sequences. Our autoregressive model is also capable of autocompleting a volumetric design sequence from a partial design sequence.

HCMay 18, 2024
CoLay: Controllable Layout Generation through Multi-conditional Latent Diffusion

Chin-Yi Cheng, Ruiqi Gao, Forrest Huang et al.

Layout design generation has recently gained significant attention due to its potential applications in various fields, including UI, graphic, and floor plan design. However, existing models face two main challenges that limits their adoption in practice. Firstly, the limited expressiveness of individual condition types used in previous works restricts designers' ability to convey complex design intentions and constraints. Secondly, most existing models focus on generating labels and coordinates, while real layouts contain a range of style properties. To address these limitations, we propose a novel framework, CoLay, that integrates multiple condition types and generates complex layouts with diverse style properties. Our approach outperforms prior works in terms of generation quality and condition satisfaction while empowering users to express their design intents using a flexible combination of modalities, including natural language prompts, layout guidelines, element types, and partially completed designs.

AIDec 22, 2024
Visual Prompting with Iterative Refinement for Design Critique Generation

Peitong Duan, Chin-Yi Cheng, Bjoern Hartmann et al.

Feedback is crucial for every design process, such as user interface (UI) design, and automating design critiques can significantly improve the efficiency of the design workflow. Although existing multimodal large language models (LLMs) excel in many tasks, they often struggle with generating high-quality design critiques -- a complex task that requires producing detailed design comments that are visually grounded in a given design's image. Building on recent advancements in iterative refinement of text output and visual prompting methods, we propose an iterative visual prompting approach for UI critique that takes an input UI screenshot and design guidelines and generates a list of design comments, along with corresponding bounding boxes that map each comment to a specific region in the screenshot. The entire process is driven completely by LLMs, which iteratively refine both the text output and bounding boxes using few-shot samples tailored for each step. We evaluated our approach using Gemini-1.5-pro and GPT-4o, and found that human experts generally preferred the design critiques generated by our pipeline over those by the baseline, with the pipeline reducing the gap from human performance by 50% for one rating metric. To assess the generalizability of our approach to other multimodal tasks, we applied our pipeline to open-vocabulary object and attribute detection, and experiments showed that our method also outperformed the baseline.

CLMay 16, 2024
Leveraging Human Revisions for Improving Text-to-Layout Models

Amber Xie, Chin-Yi Cheng, Forrest Huang et al.

Learning from human feedback has shown success in aligning large, pretrained models with human values. Prior works have mostly focused on learning from high-level labels, such as preferences between pairs of model outputs. On the other hand, many domains could benefit from more involved, detailed feedback, such as revisions, explanations, and reasoning of human users. Our work proposes using nuanced feedback through the form of human revisions for stronger alignment. In this paper, we ask expert designers to fix layouts generated from a generative layout model that is pretrained on a large-scale dataset of mobile screens. Then, we train a reward model based on how human designers revise these generated layouts. With the learned reward model, we optimize our model with reinforcement learning from human feedback (RLHF). Our method, Revision-Aware Reward Models ($\method$), allows a generative text-to-layout model to produce more modern, designer-aligned layouts, showing the potential for utilizing human revisions and stronger forms of feedback in improving generative models.

CVOct 6, 2021
CLIP-Forge: Towards Zero-Shot Text-to-Shape Generation

Aditya Sanghi, Hang Chu, Joseph G. Lambourne et al.

Generating shapes using natural language can enable new ways of imagining and creating the things around us. While significant recent progress has been made in text-to-image generation, text-to-shape generation remains a challenging problem due to the unavailability of paired text and shape data at a large scale. We present a simple yet effective method for zero-shot text-to-shape generation that circumvents such data scarcity. Our proposed method, named CLIP-Forge, is based on a two-stage training process, which only depends on an unlabelled shape dataset and a pre-trained image-text network such as CLIP. Our method has the benefits of avoiding expensive inference time optimization, as well as the ability to generate multiple shapes for a given text. We not only demonstrate promising zero-shot generalization of the CLIP-Forge model qualitatively and quantitatively, but also provide extensive comparative evaluations to better understand its behavior.

LGJul 19, 2021
Structural Design Recommendations in the Early Design Phase using Machine Learning

Spyridon Ampanavos, Mehdi Nourbakhsh, Chin-Yi Cheng

Structural engineering knowledge can be of significant importance to the architectural design team during the early design phase. However, architects and engineers do not typically work together during the conceptual phase; in fact, structural engineers are often called late into the process. As a result, updates in the design are more difficult and time-consuming to complete. At the same time, there is a lost opportunity for better design exploration guided by structural feedback. In general, the earlier in the design process the iteration happens, the greater the benefits in cost efficiency and informed de-sign exploration, which can lead to higher-quality creative results. In order to facilitate an informed exploration in the early design stage, we suggest the automation of fundamental structural engineering tasks and introduce ApproxiFramer, a Machine Learning-based system for the automatic generation of structural layouts from building plan sketches in real-time. The system aims to assist architects by presenting them with feasible structural solutions during the conceptual phase so that they proceed with their design with adequate knowledge of its structural implications. In this paper, we describe the system and evaluate the performance of a proof-of-concept implementation in the domain of orthogonal, metal, rigid structures. We trained a Convolutional Neural Net to iteratively generate structural design solutions for sketch-level building plans using a synthetic dataset and achieved an average error of 2.2% in the predicted positions of the columns.

LGApr 27, 2021
Building-GAN: Graph-Conditioned Architectural Volumetric Design Generation

Kai-Hung Chang, Chin-Yi Cheng, Jieliang Luo et al.

Volumetric design is the first and critical step for professional building design, where architects not only depict the rough 3D geometry of the building but also specify the programs to form a 2D layout on each floor. Though 2D layout generation for a single story has been widely studied, there is no developed method for multi-story buildings. This paper focuses on volumetric design generation conditioned on an input program graph. Instead of outputting dense 3D voxels, we propose a new 3D representation named voxel graph that is both compact and expressive for building geometries. Our generator is a cross-modal graph neural network that uses a pointer mechanism to connect the input program graph and the output voxel graph, and the whole pipeline is trained using the adversarial framework. The generated designs are evaluated qualitatively by a user study and quantitatively using three metrics: quality, diversity, and connectivity accuracy. We show that our model generates realistic 3D volumetric designs and outperforms previous methods and baselines.

CVMar 3, 2021
House-GAN++: Generative Adversarial Layout Refinement Networks

Nelson Nauata, Sepidehsadat Hosseini, Kai-Hung Chang et al.

This paper proposes a novel generative adversarial layout refinement network for automated floorplan generation. Our architecture is an integration of a graph-constrained relational GAN and a conditional GAN, where a previously generated layout becomes the next input constraint, enabling iterative refinement. A surprising discovery of our research is that a simple non-iterative training process, dubbed component-wise GT-conditioning, is effective in learning such a generator. The iterative generator also creates a new opportunity in further improving a metric of choice via meta-optimization techniques by controlling when to pass which input constraints during iterative layout refinement. Our qualitative and quantitative evaluation based on the three standard metrics demonstrate that the proposed system makes significant improvements over the current state-of-the-art, even competitive against the ground-truth floorplans, designed by professional architects.

HCSep 1, 2020
SketchOpt: Sketch-based Parametric Model Retrieval for Generative Design

Mohammad Keshavarzi, Clayton Hutson, Chin-Yi Cheng et al.

Developing fully parametric building models for performance-based generative design tasks often requires proficiency in many advanced 3D modeling and visual programming, limiting its use for many building designers. Moreover, iterations of such models can be time-consuming tasks and sometimes limiting, as major changes in the layout design may result in remodeling the entire parametric definition. To address these challenges, we introduce a novel automated generative design system, which takes a basic floor plan sketch as an input and provides a parametric model prepared for multi-objective building optimization as output. Furthermore, the user-designer can assign various design variables for its desired building elements by using simple annotations in the drawing. The system would recognize the corresponding element and define variable constraints to prepare for a multi-objective optimization problem.

LGMar 20, 2020
Learning to simulate and design for structural engineering

Kai-Hung Chang, Chin-Yi Cheng

The structural design process for buildings is time-consuming and laborious. To automate this process, structural engineers combine optimization methods with simulation tools to find an optimal design with minimal building mass subject to building regulations. However, structural engineers in practice often avoid optimization and compromise on a suboptimal design for the majority of buildings, due to the large size of the design space, the iterative nature of the optimization methods, and the slow simulation tools. In this work, we formulate the building structures as graphs and create an end-to-end pipeline that can learn to propose the optimal cross-sections of columns and beams by training together with a pre-trained differentiable structural simulator. The performance of the proposed structural designs is comparable to the ones optimized by genetic algorithm (GA), with all the constraints satisfied. The optimal structural design with the reduced the building mass can not only lower the material cost, but also decrease the carbon footprint.

CVMar 16, 2020
House-GAN: Relational Generative Adversarial Networks for Graph-constrained House Layout Generation

Nelson Nauata, Kai-Hung Chang, Chin-Yi Cheng et al.

This paper proposes a novel graph-constrained generative adversarial network, whose generator and discriminator are built upon relational architecture. The main idea is to encode the constraint into the graph structure of its relational networks. We have demonstrated the proposed architecture for a new house layout generation problem, whose task is to take an architectural constraint as a graph (i.e., the number and types of rooms with their spatial adjacency) and produce a set of axis-aligned bounding boxes of rooms. We measure the quality of generated house layouts with the three metrics: the realism, the diversity, and the compatibility with the input graph constraint. Our qualitative and quantitative evaluations over 117,000 real floorplan images demonstrate that the proposed approach outperforms existing methods and baselines. We will publicly share all our code and data.

CVSep 21, 2018
Unsupervised Image to Sequence Translation with Canvas-Drawer Networks

Kevin Frans, Chin-Yi Cheng

Encoding images as a series of high-level constructs, such as brush strokes or discrete shapes, can often be key to both human and machine understanding. In many cases, however, data is only available in pixel form. We present a method for generating images directly in a high-level domain (e.g. brush strokes), without the need for real pairwise data. Specifically, we train a "canvas" network to imitate the mapping of high-level constructs to pixels, followed by a high-level "drawing" network which is optimized through this mapping towards solving a desired image recreation or translation task. We successfully discover sequential vector representations of symbols, large sketches, and 3D objects, utilizing only pixel data. We display applications of our method in image segmentation, and present several ablation studies comparing various configurations.