SELGSep 15, 2025

Analysing Python Machine Learning Notebooks with Moose

arXiv:2509.11748v1h-index: 29
Originality Incremental advance
AI Analysis

This addresses the problem of unreliable and poorly structured ML code in notebooks for developers and researchers, though it is incremental as it builds on existing analysis tools with a multi-level approach.

The paper tackles the problem of low-quality machine learning code in Python notebooks by introducing Vespucci Linter, a static analysis tool that identifies issues across three levels: general Python coding, notebook structure, and ML-specific aspects. The tool was applied to 5,000 Kaggle notebooks, revealing violations at all levels and demonstrating its potential to improve ML development quality.

Machine Learning (ML) code, particularly within notebooks, often exhibits lower quality compared to traditional software. Bad practices arise at three distinct levels: general Python coding conventions, the organizational structure of the notebook itself, and ML-specific aspects such as reproducibility and correct API usage. However, existing analysis tools typically focus on only one of these levels and struggle to capture ML-specific semantics, limiting their ability to detect issues. This paper introduces Vespucci Linter, a static analysis tool with multi-level capabilities, built on Moose and designed to address this challenge. Leveraging a metamodeling approach that unifies the notebook's structural elements with Python code entities, our linter enables a more contextualized analysis to identify issues across all three levels. We implemented 22 linting rules derived from the literature and applied our tool to a corpus of 5,000 notebooks from the Kaggle platform. The results reveal violations at all levels, validating the relevance of our multi-level approach and demonstrating Vespucci Linter's potential to improve the quality and reliability of ML development in notebook environments.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes