SE AISep 21, 2024

N-Version Assessment and Enhancement of Generative AI

arXiv:2409.14071v26 citationsh-index: 7

Originality Incremental advance

AI Analysis

This addresses the challenge of verification and validation for generative AI in software engineering, which is incremental as it builds on existing methods by leveraging version diversity.

The paper tackles the problem of untrustworthy outputs from generative AI in code synthesis by proposing a differential GAI approach that uses multiple versions of code and tests for comparative analysis, resulting in more reliable quality evaluation through version diversity.

Generative AI (GAI) holds great potential to improve software engineering productivity, but its untrustworthy outputs, particularly in code synthesis, pose significant challenges. The need for extensive verification and validation (V&V) of GAI-generated artifacts may undermine the potential productivity gains. This paper proposes a way of mitigating these risks by exploiting GAI's ability to generate multiple versions of code and tests to facilitate comparative analysis across versions. Rather than relying on the quality of a single test or code module, this "differential GAI" (D-GAI) approach promotes more reliable quality evaluation through version diversity. We introduce the Large-Scale Software Observatorium (LASSO), a platform that supports D-GAI by executing and analyzing large sets of code versions and tests. We discuss how LASSO enables rigorous evaluation of GAI-generated artifacts and propose its application in both software development and GAI research.

View on arXiv PDF

Similar