Addressing Quality Challenges in Deep Learning: The Role of MLOps and Domain Knowledge
It addresses quality management problems for software engineers working on deep learning projects, offering incremental insights based on experiences.
This paper tackles the challenge of managing quality attributes like correctness and resource efficiency in deep learning systems by exploring the role of MLOps practices and domain knowledge, finding that these approaches help teams assess design decisions and justify when to stop optimizations to maximize system reliability.
Deep learning (DL) systems present unique challenges in software engineering, especially concerning quality attributes like correctness and resource efficiency. While DL models excel in specific tasks, engineering DL systems is still essential. The effort, cost, and potential diminishing returns of continual improvements must be carefully evaluated, as software engineers often face the critical decision of when to stop refining a system relative to its quality attributes. This experience paper explores the role of MLOps practices -- such as monitoring and experiment tracking -- in creating transparent and reproducible experimentation environments that enable teams to assess and justify the impact of design decisions on quality attributes. Furthermore, we report on experiences addressing the quality challenges by embedding domain knowledge into the design of a DL model and its integration within a larger system. The findings offer actionable insights into the benefits of domain knowledge and MLOps and the strategic consideration of when to limit further optimizations in DL projects to maximize overall system quality and reliability.