SEJan 26

Rethinking Artifact Evaluation for Software Engineering in the Age of Generative AI

Christoph Treude, Christopher M. Poskitt, Rashina Hoda

arXiv:2604.163061 citationsh-index: 35

AI Analysis

For the software engineering research community, this position paper highlights a growing problem with peer review quality due to generative AI and proposes a solution, though it is an incremental argument without empirical validation.

The paper argues that artifact evaluation should be a first-class component of peer review in software engineering, as generative AI reduces the effort needed for narrative quality, making it a weaker signal of rigor. The authors frame peer review as an attention allocation problem and advocate for prioritizing artifact evaluation.

Peer review in software engineering research operates under tight time constraints, while generative AI has substantially reduced the human effort required to produce polished research narratives. Reviewer attention is often spent on aspects of submissions such as writing quality or literature positioning that have become relatively less effort-intensive to address, rather than on evaluating the scientific substance of a paper. At the same time, assessing whether methods are implemented correctly, analyses are sound, and claims are supported by evidence remains effort-intensive and dependent on human expertise. In software engineering research, this substance is frequently embodied in artifacts, including code, data, evidence and analysis samples, and experimental infrastructure. In this position paper, we argue that artifact evaluation should be treated as a first-class component of peer review. We frame peer review as an attention allocation problem, examine how generative AI weakens narrative quality as a signal of rigor, and argue that artifact evaluation should play a more prominent role in peer review decisions.

View on arXiv PDF

Similar