CVSEDec 3, 2025

A Retrieval-Augmented Generation Approach to Extracting Algorithmic Logic from Neural Networks

arXiv:2512.04329v19 citationsh-index: 98Has Code
Originality Incremental advance
AI Analysis

This provides a first open-source solution for researchers and developers to systematically quantify and expand the diversity of executable neural architectures across repositories, though it appears incremental in automating existing manual processes.

The researchers tackled the problem of discovering and extracting reusable neural network modules from thousands of open-source PyTorch repositories by developing NN-RAG, a retrieval-augmented generation system that extracted 1,289 candidate blocks and validated 941 (73.0%) as executable modules, with over 80% being structurally unique and contributing approximately 72% of novel network structures to the LEMUR dataset.

Reusing existing neural-network components is central to research efficiency, yet discovering, extracting, and validating such modules across thousands of open-source repositories remains difficult. We introduce NN-RAG, a retrieval-augmented generation system that converts large, heterogeneous PyTorch codebases into a searchable and executable library of validated neural modules. Unlike conventional code search or clone-detection tools, NN-RAG performs scope-aware dependency resolution, import-preserving reconstruction, and validator-gated promotion -- ensuring that every retrieved block is scope-closed, compilable, and runnable. Applied to 19 major repositories, the pipeline extracted 1,289 candidate blocks, validated 941 (73.0%), and demonstrated that over 80% are structurally unique. Through multi-level de-duplication (exact, lexical, structural), we find that NN-RAG contributes the overwhelming majority of unique architectures to the LEMUR dataset, supplying approximately 72% of all novel network structures. Beyond quantity, NN-RAG uniquely enables cross-repository migration of architectural patterns, automatically identifying reusable modules in one project and regenerating them, dependency-complete, in another context. To our knowledge, no other open-source system provides this capability at scale. The framework's neutral specifications further allow optional integration with language models for synthesis or dataset registration without redistributing third-party code. Overall, NN-RAG transforms fragmented vision code into a reproducible, provenance-tracked substrate for algorithmic discovery, offering a first open-source solution that both quantifies and expands the diversity of executable neural architectures across repositories.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes