SENov 1, 2023
Model-driven Engineering for Machine Learning Components: A Systematic Literature ReviewHira Naveed, Chetan Arora, Hourieh Khalajzadeh et al.
Context: Machine Learning (ML) has become widely adopted as a component in many modern software applications. Due to the large volumes of data available, organizations want to increasingly leverage their data to extract meaningful insights and enhance business profitability. ML components enable predictive capabilities, anomaly detection, recommendation, accurate image and text processing, and informed decision-making. However, developing systems with ML components is not trivial; it requires time, effort, knowledge, and expertise in ML, data processing, and software engineering. There have been several studies on the use of model-driven engineering (MDE) techniques to address these challenges when developing traditional software and cyber-physical systems. Recently, there has been a growing interest in applying MDE for systems with ML components. Objective: The goal of this study is to further explore the promising intersection of MDE with ML (MDE4ML) through a systematic literature review (SLR). Through this SLR, we wanted to analyze existing studies, including their motivations, MDE solutions, evaluation techniques, key benefits and limitations. Results: We analyzed selected studies with respect to several areas of interest and identified the following: 1) the key motivations behind using MDE4ML; 2) a variety of MDE solutions applied, such as modeling languages, model transformations, tool support, targeted ML aspects, contributions and more; 3) the evaluation techniques and metrics used; and 4) the limitations and directions for future work. We also discuss the gaps in existing literature and provide recommendations for future research. Conclusion: This SLR highlights current trends, gaps and future research directions in the field of MDE4ML, benefiting both researchers and practitioners
SESep 17, 2025
Monitoring Machine Learning Systems: A Multivocal Literature ReviewHira Naveed, Scott Barnett, Chetan Arora et al.
Context: Dynamic production environments make it challenging to maintain reliable machine learning (ML) systems. Runtime issues, such as changes in data patterns or operating contexts, that degrade model performance are a common occurrence in production settings. Monitoring enables early detection and mitigation of these runtime issues, helping maintain users' trust and prevent unwanted consequences for organizations. Aim: This study aims to provide a comprehensive overview of the ML monitoring literature. Method: We conducted a multivocal literature review (MLR) following the well established guidelines by Garousi to investigate various aspects of ML monitoring approaches in 136 papers. Results: We analyzed selected studies based on four key areas: (1) the motivations, goals, and context; (2) the monitored aspects, specific techniques, metrics, and tools; (3) the contributions and benefits; and (4) the current limitations. We also discuss several insights found in the studies, their implications, and recommendations for future research and practice. Conclusion: Our MLR identifies and summarizes ML monitoring practices and gaps, emphasizing similarities and disconnects between formal and gray literature. Our study is valuable for both academics and practitioners, as it helps select appropriate solutions, highlights limitations in current approaches, and provides future directions for research and tool development.
SEAug 20, 2025
Understanding Practitioners Perspectives on Monitoring Machine Learning SystemsHira Naveed, John Grundy, Chetan Arora et al.
Given the inherent non-deterministic nature of machine learning (ML) systems, their behavior in production environments can lead to unforeseen and potentially dangerous outcomes. For a timely detection of unwanted behavior and to prevent organizations from financial and reputational damage, monitoring these systems is essential. This paper explores the strategies, challenges, and improvement opportunities for monitoring ML systems from the practitioners perspective. We conducted a global survey of 91 ML practitioners to collect diverse insights into current monitoring practices for ML systems. We aim to complement existing research through our qualitative and quantitative analyses, focusing on prevalent runtime issues, industrial monitoring and mitigation practices, key challenges, and desired enhancements in future monitoring tools. Our findings reveal that practitioners frequently struggle with runtime issues related to declining model performance, exceeding latency, and security violations. While most prefer automated monitoring for its increased efficiency, many still rely on manual approaches due to the complexity or lack of appropriate automation solutions. Practitioners report that the initial setup and configuration of monitoring tools is often complicated and challenging, particularly when integrating with ML systems and setting alert thresholds. Moreover, practitioners find that monitoring adds extra workload, strains resources, and causes alert fatigue. The desired improvements from the practitioners perspective are: automated generation and deployment of monitors, improved support for performance and fairness monitoring, and recommendations for resolving runtime issues. These insights offer valuable guidance for the future development of ML monitoring tools that are better aligned with practitioners needs.