SE AI LGSep 29, 2025

A Cartography of Open Collaboration in Open Source AI: Mapping Practices, Motivations, and Governance in 14 Open Large Language Model Projects

Johan Linåker, Cailean Osborne, Jennifer Ding, Ben Burtenshaw

arXiv:2509.25397v12 citationsh-index: 13Has Code

Originality Synthesis-oriented

AI Analysis

This research addresses a gap in understanding how open LLM projects are organized and governed, which is important for stakeholders aiming to foster the open AI ecosystem.

The study mapped collaboration practices in 14 open large language model projects, finding that collaboration extends beyond models to include datasets, benchmarks, and forums, with developers motivated by democratizing AI access and regional ecosystem building, and projects exhibiting five distinct organizational models.

The proliferation of open large language models (LLMs) is fostering a vibrant ecosystem of research and innovation in artificial intelligence (AI). However, the methods of collaboration used to develop open LLMs both before and after their public release have not yet been comprehensively studied, limiting our understanding of how open LLM projects are initiated, organized, and governed as well as what opportunities there are to foster this ecosystem even further. We address this gap through an exploratory analysis of open collaboration throughout the development and reuse lifecycle of open LLMs, drawing on semi-structured interviews with the developers of 14 open LLMs from grassroots projects, research institutes, startups, and Big Tech companies in North America, Europe, Africa, and Asia. We make three key contributions to research and practice. First, collaboration in open LLM projects extends far beyond the LLMs themselves, encompassing datasets, benchmarks, open source frameworks, leaderboards, knowledge sharing and discussion forums, and compute partnerships, among others. Second, open LLM developers have a variety of social, economic, and technological motivations, from democratizing AI access and promoting open science to building regional ecosystems and expanding language representation. Third, the sampled open LLM projects exhibit five distinct organizational models, ranging from single company projects to non-profit-sponsored grassroots projects, which vary in their centralization of control and community engagement strategies used throughout the open LLM lifecycle. We conclude with practical recommendations for stakeholders seeking to support the global community building a more open future for AI.

View on arXiv PDF

Similar