SECYLGMar 24, 2023

An investigation of licensing of datasets for machine learning based on the GQM model

arXiv:2303.13735v12 citationsh-index: 15
Originality Synthesis-oriented
AI Analysis

This addresses licensing issues for machine learning developers, but it is incremental as it applies an existing model to a known problem.

The paper investigates the problem of incomplete licensing in machine learning datasets, finding that most datasets lack licenses, which hinders determining commercial availability. It proposes a scientific approach using the GQM model to improve licensing practices for future developers.

Dataset licensing is currently an issue in the development of machine learning systems. And in the development of machine learning systems, the most widely used are publicly available datasets. However, since the images in the publicly available dataset are mainly obtained from the Internet, some images are not commercially available. Furthermore, developers of machine learning systems do not often care about the license of the dataset when training machine learning models with it. In summary, the licensing of datasets for machine learning systems is in a state of incompleteness in all aspects at this stage. Our investigation of two collection datasets revealed that most of the current datasets lacked licenses, and the lack of licenses made it impossible to determine the commercial availability of the datasets. Therefore, we decided to take a more scientific and systematic approach to investigate the licensing of datasets and the licensing of machine learning systems that use the dataset to make it easier and more compliant for future developers of machine learning systems.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes