SELGDec 5, 2024

An Efficient Model Maintenance Approach for MLOps

arXiv:2412.04657v23 citationsh-index: 48Empir Softw Eng
AI Analysis

This addresses the challenge of costly and time-consuming model maintenance in MLOps for industries using time series data, though it appears incremental as it builds on existing MLOps pipelines with a specific tool.

The paper tackles the problem of ML model performance degradation due to concept drift in time series data by proposing a model reuse approach that identifies recurrent data distribution patterns to avoid unnecessary retrainings. The approach maintains comparable performance to baselines while reducing computation time and costs to 1/8th.

In recent years, many industries have utilized machine learning (ML) models in their systems. Ideally, ML models should be trained on and applied to data from the same distributions. However, the data evolves over time in many application areas, leading to concept drift, which in turn causes the performance of the ML models to degrade over time. Therefore, maintaining up-to-date ML models plays a critical role in the MLOps pipeline. Existing ML model maintenance approaches are often computationally resource-intensive, costly, time-consuming, and model-dependent. Thus, we propose an improved MLOps pipeline, a new model maintenance approach and a Similarity-Based Model Reuse (SimReuse) tool to address the challenges of ML model maintenance. We identify seasonal and recurrent data distribution patterns in time series datasets throughout a preliminary study. Recurrent data distribution patterns enable us to reuse previously trained models for similar distributions in the future, thus avoiding frequent unnecessary retrainings. Then, we integrated the model reuse approach into the MLOps pipeline and proposed our improved MLOps pipeline. Furthermore, we develop SimReuse, a tool to implement the new components of our MLOps pipeline to store models and reuse them for inference of data segments with similar data distributions in the future. Our evaluation results on five time series datasets demonstrate that our model reuse approach can maintain the models' performance while significantly reducing maintenance time, costs, and the number of retrainings. Our model reuse approach achieves ML model performance comparable to the best baselines, while reducing the computation time and costs to 1/8th. Therefore, industries and practitioners can benefit from our approach and use our tool to maintain their ML models' performance in the deployment phase to reduce their maintenance time and costs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes