E-comIQ-ZH: A Human-Aligned Dataset and Benchmark for Fine-Grained Evaluation of E-commerce Posters with Chain-of-Thought
This work addresses a domain-specific problem for e-commerce design and AI evaluation in Chinese content, providing tools for automated quality assessment, but it is incremental as it builds on existing evaluation methods with a new dataset and model.
The authors tackled the problem of automated quality assessment for Chinese e-commerce posters, which lacked functional criteria and overlooked textual artifacts, by introducing E-comIQ-ZH, a framework that includes a dataset, a specialized evaluation model, and a benchmark, resulting in closer alignment with expert standards and enabling scalable automated assessment.
Generative AI is widely used to create commercial posters. However, rapid advances in generation have outpaced automated quality assessment. Existing models emphasize generic esthetics or low level distortions and lack the functional criteria required for e-commerce design. It is especially challenging for Chinese content, where complex characters often produce subtle but critical textual artifacts that are overlooked by existing methods. To address this, we introduce E-comIQ-ZH, a framework for evaluating Chinese e-commerce posters. We build the first dataset E-comIQ-18k to feature multi dimensional scores and expert calibrated Chain of Thought (CoT) rationales. Using this dataset, we train E-comIQ-M, a specialized evaluation model that aligns with human expert judgment. Our framework enables E-comIQ-Bench, the first automated and scalable benchmark for the generation of Chinese e-commerce posters. Extensive experiments show our E-comIQ-M aligns more closely with expert standards and enables scalable automated assessment of e-commerce posters. All datasets, models, and evaluation tools will be released to support future research in this area.Code will be available at https://github.com/4mm7/E-comIQ-ZH.