SELGNov 14, 2024

How do Machine Learning Models Change?

arXiv:2411.09645v26 citationsh-index: 5Has CodeACM Trans Softw Eng Methodol
Originality Synthesis-oriented
AI Analysis

It addresses the problem of understanding model evolution for AI researchers and practitioners, but is incremental as it applies existing methods to new data.

This study tackled the lack of large-scale longitudinal analysis of how machine learning models evolve by analyzing over 680,000 commits and 2,251 releases from Hugging Face, finding that commit activities align with iterative methodologies like CRISP-DM and release patterns consolidate updates in outputs and documentation.

The proliferation of Machine Learning (ML) models and their open-source implementations has transformed Artificial Intelligence research and applications. Platforms like Hugging Face (HF) enable this evolving ecosystem, yet a large-scale longitudinal study of how these models change is lacking. This study addresses this gap by analyzing over 680,000 commits from 100,000 models and 2,251 releases from 202 of these models on HF using repository mining and longitudinal methods. We apply an extended ML change taxonomy to classify commits and use Bayesian networks to model temporal patterns in commit and release activities. Our findings show that commit activities align with established data science methodologies, such as the Cross-Industry Standard Process for Data Mining (CRISP-DM), emphasizing iterative refinement. Release patterns tend to consolidate significant updates, particularly in model outputs, sharing, and documentation, distinguishing them from granular commits. Furthermore, projects with higher popularity exhibit distinct evolutionary paths, often starting from a more mature baseline with fewer foundational commits in their public history. In contrast, those with intensive collaboration show unique documentation and technical evolution patterns. These insights enhance the understanding of model changes on community platforms and provide valuable guidance for best practices in model maintenance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes