LGDec 16, 2025

End-to-End Data Quality-Driven Framework for Machine Learning in Production Environment

arXiv:2512.19723v1
Originality Incremental advance
AI Analysis

It addresses the gap between theoretical data quality methods and practical MLOps for industrial applications, offering a solution for time-sensitive decision-making in dynamic environments.

The paper tackles the problem of integrating data quality assessment with ML model operations in production by introducing an end-to-end framework, resulting in a 12% improvement in model performance and a fourfold reduction in prediction latency in a steel manufacturing case study.

This paper introduces a novel end-to-end framework that efficiently integrates data quality assessment with machine learning (ML) model operations in real-time production environments. While existing approaches treat data quality assessment and ML systems as isolated processes, our framework addresses the critical gap between theoretical methods and practical implementation by combining dynamic drift detection, adaptive data quality metrics, and MLOps into a cohesive, lightweight system. The key innovation lies in its operational efficiency, enabling real-time, quality-driven ML decision-making with minimal computational overhead. We validate the framework in a steel manufacturing company's Electroslag Remelting (ESR) vacuum pumping process, demonstrating a 12% improvement in model performance (R2 = 94%) and a fourfold reduction in prediction latency. By exploring the impact of data quality acceptability thresholds, we provide actionable insights into balancing data quality standards and predictive performance in industrial applications. This framework represents a significant advancement in MLOps, offering a robust solution for time-sensitive, data-driven decision-making in dynamic industrial environments.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes