AIMay 27, 2025

AutoReproduce: Automatic AI Experiment Reproduction with Paper Lineage

arXiv:2505.20662v215 citationsh-index: 31Has Code
Originality Incremental advance
AI Analysis

This addresses the problem of inefficient experiment reproduction for AI researchers, though it is an incremental improvement on existing automation methods.

The paper tackles the challenge of automatically reproducing AI experiments by introducing AutoReproduce, a multi-agent framework that uses a paper lineage algorithm to extract implicit knowledge from references, and it outperforms existing baselines by up to 70% on evaluation metrics.

Efficient experiment reproduction is critical to accelerating progress in artificial intelligence. However, the inherent complexity of method design and training procedures presents substantial challenges for automation. Notably, reproducing experiments often requires implicit domain-specific knowledge not explicitly documented in the original papers. To address this, we introduce the paper lineage algorithm, which identifies and extracts implicit knowledge from the relevant references cited by the target paper. Building on this idea, we propose AutoReproduce, a multi-agent framework capable of automatically reproducing experiments described in research papers in an end-to-end manner. AutoReproduce enhances code executability by generating unit tests alongside the reproduction process. To evaluate the reproduction capability, we construct ReproduceBench, a benchmark annotated with verified implementations, and introduce novel evaluation metrics to assess both the reproduction and execution fidelity. Experimental results demonstrate that AutoReproduce outperforms the existing strong agent baselines on all five evaluation metrics by a peak margin of over $70\%$. In particular, compared to the official implementations, AutoReproduce achieves an average performance gap of $22.1\%$ on $89.74\%$ of the executable experiment runs. The code will be available at https://github.com/AI9Stars/AutoReproduce.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes