SE AINov 18, 2025

Technique to Baseline QE Artefact Generation Aligned to Quality Metrics

Eitan Farchi, Kiran Nayak, Papia Ghosh Majumdar, Saritha Route

arXiv:2511.15733v13.4

Originality Incremental advance

AI Analysis

This work addresses the problem of unreliable automation in Quality Engineering for practitioners, though it appears incremental as it builds on existing LLM and rubric-based methods.

The paper tackles the challenge of ensuring quality in LLM-generated Quality Engineering artefacts by introducing a systematic technique that uses quantifiable metrics for baselining and evaluation, with experimental results across 12 projects showing that reverse-generated artefacts can outperform low-quality inputs and maintain high standards.

Large Language Models (LLMs) are transforming Quality Engineering (QE) by automating the generation of artefacts such as requirements, test cases, and Behavior Driven Development (BDD) scenarios. However, ensuring the quality of these outputs remains a challenge. This paper presents a systematic technique to baseline and evaluate QE artefacts using quantifiable metrics. The approach combines LLM-driven generation, reverse generation , and iterative refinement guided by rubrics technique for clarity, completeness, consistency, and testability. Experimental results across 12 projects show that reverse-generated artefacts can outperform low-quality inputs and maintain high standards when inputs are strong. The framework enables scalable, reliable QE artefact validation, bridging automation with accountability.

View on arXiv PDF

Similar