LGNov 13, 2023

How to Do Machine Learning with Small Data? -- A Review from an Industrial Perspective

arXiv:2311.07126v110 citationsh-index: 7
Originality Synthesis-oriented
AI Analysis

This addresses the practical problem of implementing machine learning in data-scarce industrial applications, though it is an incremental review rather than a novel solution.

The paper reviews the challenges of applying machine learning with limited data in industrial settings, defining 'small data' characteristics and identifying five key challenges including unlabeled, imbalanced, missing, insufficient data, and rare events.

Artificial intelligence experienced a technological breakthrough in science, industry, and everyday life in the recent few decades. The advancements can be credited to the ever-increasing availability and miniaturization of computational resources that resulted in exponential data growth. However, because of the insufficient amount of data in some cases, employing machine learning in solving complex tasks is not straightforward or even possible. As a result, machine learning with small data experiences rising importance in data science and application in several fields. The authors focus on interpreting the general term of "small data" and their engineering and industrial application role. They give a brief overview of the most important industrial applications of machine learning and small data. Small data is defined in terms of various characteristics compared to big data, and a machine learning formalism was introduced. Five critical challenges of machine learning with small data in industrial applications are presented: unlabeled data, imbalanced data, missing data, insufficient data, and rare events. Based on those definitions, an overview of the considerations in domain representation and data acquisition is given along with a taxonomy of machine learning approaches in the context of small data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes