CLSep 29, 2025

Evaluating Spatiotemporal Consistency in Automatically Generated Sewing Instructions

arXiv:2509.24792v11 citationsh-index: 3EMNLP
Originality Incremental advance
AI Analysis

This work addresses the need for more accurate evaluation of assembly instructions in the sewing domain, though it is incremental as it builds on existing evaluation methods.

The authors tackled the problem of evaluating spatiotemporal consistency in automatically generated sewing instructions by proposing a novel tree-based metric, which showed better correlation with human error counts and quality ratings than traditional metrics like BLEU and BERT scores.

In this paper, we propose a novel, automatic tree-based evaluation metric for LLM-generated step-by-step assembly instructions, that more accurately reflects spatiotemporal aspects of construction than traditional metrics such as BLEU and BERT similarity scores. We apply our proposed metric to the domain of sewing instructions, and show that our metric better correlates with manually-annotated error counts as well as human quality ratings, demonstrating our metric's superiority for evaluating the spatiotemporal soundness of sewing instructions. Further experiments show that our metric is more robust than traditional approaches against artificially-constructed counterfactual examples that are specifically constructed to confound metrics that rely on textual similarity.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes