GR LGJun 3

Aggregating LLM-Based Weak Verifiers for Spatial Layout Generation

Sharon Zhang, R. Kenny Jones, Jiajun Wu, Maneesh Agrawala

arXiv:2606.0526879.5

Predicted impact top 19% in GR · last 90 daysOriginality Incremental advance

AI Analysis

This work addresses the need for reliable automated evaluation in spatial layout tasks, offering a method to build strong verifiers from sparse human labels.

The authors propose a pipeline that aggregates LLM-generated weak verifiers into a strong verifier for spatial layout generation, achieving up to 7X improvement in F1-score over direct LLM judges and up to 66.2% improvement in layout quality via verifier-guided generation.

We present a pipeline for building and aggregating task-specific, LLM-generated weak (imperfect) verifiers into a strong verifier for spatial layout domains. Given a task description, our pipeline asks an LLM to synthesize a collection of verifier programs using a layout verification DSL. Each individual LLM-generated verifier usually provides an imperfect check for a match between the layout and the corresponding task description. We show that by aggregating the responses of many such verifiers we can produce a stronger verifier. Moreover, by applying techniques from weak learning, our pipeline can learn how to aggregate the weak verifiers from a very sparse set of human labeled example layouts (about 10). We find that the strong verifiers produced by our pipeline outperform the status-quo approach of using a set of LLM judges to directly check whether a layout matches a task description, raising F1-scores by up to 7X across a variety of 3D room layout and 2D poster design tasks. We also demonstrate that verifier-guided layout generation using natural language feedback from our strong verifiers improves layout quality of a base layout generator by up to 66.2% according to a human evaluator.

View on arXiv PDF

Similar