CLCVMar 14, 2025

TikZero: Zero-Shot Text-Guided Graphics Program Synthesis

arXiv:2503.11509v316 citationsh-index: 12Has Code
Originality Highly original
AI Analysis

This addresses the challenge of generating precise and editable figures from text for applications in design and documentation, leveraging disparate data sources to overcome data scarcity.

The paper tackles the problem of synthesizing graphics programs from text captions without aligned training data by introducing TikZero, which uses image representations as a bridge to decouple training on unaligned graphics programs and captioned images, enabling zero-shot synthesis. The method substantially outperforms baselines limited to aligned data and matches or exceeds larger models like GPT-4o when using aligned data as a complementary signal.

Automatically synthesizing figures from text captions is a compelling capability. However, achieving high geometric precision and editability requires representing figures as graphics programs in languages like TikZ, and aligned training data (i.e., graphics programs with captions) remains scarce. Meanwhile, large amounts of unaligned graphics programs and captioned raster images are more readily available. We reconcile these disparate data sources by presenting TikZero, which decouples graphics program generation from text understanding by using image representations as an intermediary bridge. It enables independent training on graphics programs and captioned images and allows for zero-shot text-guided graphics program synthesis during inference. We show that our method substantially outperforms baselines that can only operate with caption-aligned graphics programs. Furthermore, when leveraging caption-aligned graphics programs as a complementary training signal, TikZero matches or exceeds the performance of much larger models, including commercial systems like GPT-4o. Our code, datasets, and select models are publicly available.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes