DCDBLGOct 9, 2019

Provenance Data in the Machine Learning Lifecycle in Computational Science and Engineering

arXiv:1910.04223v242 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of reproducibility and explainability in ML models for computational science and engineering, but it is incremental as it builds on existing standards like W3C PROV and ML Schema.

The paper tackles the problem of tracking provenance data in the machine learning lifecycle within computational science and engineering, where complexity arises from diverse data, tools, and workflows, and it contributes a new provenance representation (PROV-ML) and system extensions, evaluated in a real Oil and Gas case using 48 GPUs in parallel.

Machine Learning (ML) has become essential in several industries. In Computational Science and Engineering (CSE), the complexity of the ML lifecycle comes from the large variety of data, scientists' expertise, tools, and workflows. If data are not tracked properly during the lifecycle, it becomes unfeasible to recreate a ML model from scratch or to explain to stakeholders how it was created. The main limitation of provenance tracking solutions is that they cannot cope with provenance capture and integration of domain and ML data processed in the multiple workflows in the lifecycle while keeping the provenance capture overhead low. To handle this problem, in this paper we contribute with a detailed characterization of provenance data in the ML lifecycle in CSE; a new provenance data representation, called PROV-ML, built on top of W3C PROV and ML Schema; and extensions to a system that tracks provenance from multiple workflows to address the characteristics of ML and CSE, and to allow for provenance queries with a standard vocabulary. We show a practical use in a real case in the Oil and Gas industry, along with its evaluation using 48 GPUs in parallel.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes