SE AIJan 19

ArchAgent: Scalable Legacy Software Architecture Recovery with LLMs

Rusheng Pan, Bingcheng Mao, Tianyi Ma, Zhenhua Ling

arXiv:2601.13007v12.9Has Code

Originality Incremental advance

AI Analysis

This addresses the challenge of architectural recovery for legacy software systems, which is incremental as it builds on existing methods with LLMs and static analysis.

The paper tackles the problem of recovering accurate software architecture from large-scale legacy codebases, hindered by architectural drift and missing relations, and presents ArchAgent, a scalable agent-based framework that improves over existing benchmarks in evaluations on GitHub projects, with an ablation study showing dependency context boosts accuracy.

Recovering accurate architecture from large-scale legacy software is hindered by architectural drift, missing relations, and the limited context of Large Language Models (LLMs). We present ArchAgent, a scalable agent-based framework that combines static analysis, adaptive code segmentation, and LLM-powered synthesis to reconstruct multiview, business-aligned architectures from cross-repository codebases. ArchAgent introduces scalable diagram generation with contextual pruning and integrates cross-repository data to identify business-critical modules. Evaluations of typical large-scale GitHub projects show significant improvements over existing benchmarks. An ablation study confirms that dependency context improves the accuracy of generated architectures of production-level repositories, and a real-world case study demonstrates effective recovery of critical business logics from legacy projects. The dataset is available at https://github.com/panrusheng/arch-eval-benchmark.

View on arXiv PDF Code

Similar