PLLM: Pseudo-Labeling Large Language Models for CAD Program Synthesis
This addresses the challenge of limited labeled data for CAD program synthesis, offering an incremental improvement over supervised methods by enabling training on unlabeled shapes.
The paper tackled the problem of CAD program synthesis from 3D geometries without paired shape-program data by introducing PLLM, a self-training framework that uses pseudo-labeling to generate synthetic training pairs, resulting in improved geometric fidelity and program diversity on the ABC dataset.
Recovering Computer-Aided Design (CAD) programs from 3D geometries is a widely studied problem. Recent advances in large language models (LLMs) have enabled progress in CAD program synthesis, but existing methods rely on supervised training with paired shape-program data, which is often unavailable. We introduce PLLM, a self-training framework for CAD program synthesis from unlabeled 3D shapes. Given a pre-trained CAD-capable LLM and a shape dataset, PLLM iteratively samples candidate programs, selects high-fidelity executions, and augments programs to construct synthetic program-shape pairs for fine-tuning. We experiment on adapting CAD-Recode from DeepCAD to the unlabeled ABC dataset show consistent improvements in geometric fidelity and program diversity.