SE LGFeb 20, 2024

Towards MLOps: A DevOps Tools Recommender System for Machine Learning System

Pir Sami Ullah Shah, Naveed Ahmad, Mirza Omer Beg

arXiv:2402.12867v17.03 citationsh-index: 23Has Code

Originality Synthesis-oriented

AI Analysis

This work addresses the problem of tool selection in MLOps for practitioners, but it is incremental as it applies existing machine learning methods to a new domain-specific task.

The paper tackles the challenge of selecting appropriate open-source tools for MLOps by proposing a recommender system framework that processes contextual information of machine learning projects to recommend toolchains, with random forest achieving the highest f-score of 0.66 among tested approaches.

Applying DevOps practices to machine learning system is termed as MLOps and machine learning systems evolve on new data unlike traditional systems on requirements. The objective of MLOps is to establish a connection between different open-source tools to construct a pipeline that can automatically perform steps to construct a dataset, train the machine learning model and deploy the model to the production as well as store different versions of model and dataset. Benefits of MLOps is to make sure the fast delivery of the new trained models to the production to have accurate results. Furthermore, MLOps practice impacts the overall quality of the software products and is completely dependent on open-source tools and selection of relevant open-source tools is considered as challenged while a generalized method to select an appropriate open-source tools is desirable. In this paper, we present a framework for recommendation system that processes the contextual information (e.g., nature of data, type of the data) of the machine learning project and recommends a relevant toolchain (tech-stack) for the operationalization of machine learning systems. To check the applicability of the proposed framework, four different approaches i.e., rule-based, random forest, decision trees and k-nearest neighbors were investigated where precision, recall and f-score is measured, the random forest out classed other approaches with highest f-score value of 0.66.

View on arXiv PDF

Similar