AIAug 4, 2025

"Stack It Up!": 3D Stable Structure Generation from 2D Hand-drawn Sketch

arXiv:2508.02093v12 citationsh-index: 5
Originality Highly original
AI Analysis

This addresses the problem for non-experts, such as children or casual users, who want to create 3D models from sketches, but it is incremental as it builds on existing methods like diffusion models for 3D generation.

The paper tackles the problem of generating stable 3D structures from 2D hand-drawn sketches, enabling non-experts to specify complex designs without requiring precise 3D poses or expert tools, and results in a system that consistently produces stable, multilevel structures with high visual resemblance, outperforming all baselines.

Imagine a child sketching the Eiffel Tower and asking a robot to bring it to life. Today's robot manipulation systems can't act on such sketches directly-they require precise 3D block poses as goals, which in turn demand structural analysis and expert tools like CAD. We present StackItUp, a system that enables non-experts to specify complex 3D structures using only 2D front-view hand-drawn sketches. StackItUp introduces an abstract relation graph to bridge the gap between rough sketches and accurate 3D block arrangements, capturing the symbolic geometric relations (e.g., left-of) and stability patterns (e.g., two-pillar-bridge) while discarding noisy metric details from sketches. It then grounds this graph to 3D poses using compositional diffusion models and iteratively updates it by predicting hidden internal and rear supports-critical for stability but absent from the sketch. Evaluated on sketches of iconic landmarks and modern house designs, StackItUp consistently produces stable, multilevel 3D structures and outperforms all baselines in both stability and visual resemblance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes