AINov 12, 2025

CrochetBench: Can Vision-Language Models Move from Describing to Doing in Crochet Domain?

arXiv:2511.09483v12 citationsh-index: 3Has Code
Originality Incremental advance
AI Analysis

This addresses the problem of assessing procedural competence in multimodal models for researchers, highlighting the gap between surface-level understanding and executable precision in creative domains, though it is incremental as it builds on existing benchmarks and DSLs.

The authors introduced CrochetBench, a benchmark to evaluate multimodal large language models' ability to perform fine-grained procedural reasoning in crochet, shifting from describing to doing tasks like stitch classification and generating compilable procedures. Performance sharply declined as evaluation moved from surface-level similarity to executable correctness, exposing limitations in symbolic reasoning and procedural synthesis.

We present CrochetBench, a benchmark for evaluating the ability of multimodal large language models to perform fine-grained, low-level procedural reasoning in the domain of crochet. Unlike prior benchmarks that focus on high-level description or visual question answering, CrochetBench shifts the emphasis from describing to doing: models are required to recognize stitches, select structurally appropriate instructions, and generate compilable crochet procedures. We adopt the CrochetPARADE DSL as our intermediate representation, enabling structural validation and functional evaluation via execution. The benchmark covers tasks including stitch classification, instruction grounding, and both natural language and image-to-DSL translation. Across all tasks, performance sharply declines as the evaluation shifts from surface-level similarity to executable correctness, exposing limitations in long-range symbolic reasoning and 3D-aware procedural synthesis. CrochetBench offers a new lens for assessing procedural competence in multimodal models and highlights the gap between surface-level understanding and executable precision in real-world creative domains. Code is available at https://github.com/Peiyu-Georgia-Li/crochetBench.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes