A Large-Scale Study on the Development and Issues of Multi-Agent AI Systems
It addresses the lack of understanding about the evolution and maintenance of multi-agent AI systems, which is crucial for developers and researchers to improve reliability and sustainability, though it is incremental as it applies existing empirical methods to a new domain.
This paper conducted the first large-scale empirical study on multi-agent AI systems, analyzing over 42K commits and 4.7K issues to reveal development patterns, such as 40.8% of changes being feature enhancements, and median issue resolution times ranging from under a day to two weeks.
The rapid emergence of multi-agent AI systems (MAS), including LangChain, CrewAI, and AutoGen, has shaped how large language model (LLM) applications are developed and orchestrated. However, little is known about how these systems evolve and are maintained in practice. This paper presents the first large-scale empirical study of open-source MAS, analyzing over 42K unique commits and over 4.7K resolved issues across eight leading systems. Our analysis identifies three distinct development profiles: sustained, steady, and burst-driven. These profiles reflect substantial variation in ecosystem maturity. Perfective commits constitute 40.8% of all changes, suggesting that feature enhancement is prioritized over corrective maintenance (27.4%) and adaptive updates (24.3%). Data about issues shows that the most frequent concerns involve bugs (22%), infrastructure (14%), and agent coordination challenges (10%). Issue reporting also increased sharply across all frameworks starting in 2023. Median resolution times range from under one day to about two weeks, with distributions skewed toward fast responses but a minority of issues requiring extended attention. These results highlight both the momentum and the fragility of the current ecosystem, emphasizing the need for improved testing infrastructure, documentation quality, and maintenance practices to ensure long-term reliability and sustainability.